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Description 

The present invention relates generally to a method of characterising a sample of genomic DNA and to nucleotide 
sequences employed in the method as wefl as kits comprising these. In particular the invention involves the use of prim- 

5 ers which selectively prime specific type(s) of internal repeat unit in a tandemry repeated region. The method of the 
invention is particularly useful in forensic or paternity studies and provides individual sample codes suitable for compu- 
terised storage on. and retrieval from, a database. The invention also relates to databases comprising the individual 
sample codes and to computers when programmed to use the above sample codes and databases. The invention also 
provides a novel method for the detection of diagnostic base sequences in one or more nucleic acids contained in a 

10 sample. 

Hypervariable human DNA markers are capable of identifying individuals with a high deg/ee of specificity and have 
had a profound impact on forensic and legal medicine, both for providing evidence for associations or exclusions 
between forensic evidence and criminal suspects, and for establishing kinship in for example paternity disputes. Most 
of the hypervariable loci used in DNA profiling are tandem-repetitrve minisatellite or VNTR (variable number tandem - 

is repeat) loci which can shorn extreme levels of allelic variability in repeat copy number and therefore DNA fragment 
length (Nakamura et aJ., 1987b; Wong et al., 1987). Murtilocus probes (MLPs) capable of detecting multiple hypervari- 
able minisatellites to produce a DNA fingerprint, and single locus minisatellite probes (SLPs) which reveal allelic length 
variation at individual hypervariable loci to produce much simpler DNA profiles, have been extensively used in casework 
(Jeffreys et al.. 1985a. 1985b - UK Patent 2166445/Lister Institute. 1991a; Wong et al.. 1987 - UK Patent 2188323/ICI 

20 pic). AmpBf ication of hypervariable loci using the polymerase chain reaction (PCR) has greatly increased the sensitivity 
of DNA typing systems (Jeffreys et al.. 1988; Boerwinkle et al., 1989) and has permitted the development of new 
classes of variable "rnicr □satellite" DNA markers based on simple tar^em repeat loci with very short alleles (lift and 
Luty. 1989; Tautz. 1989; Weber and May. 1989). 

Despite the power of current DNA typing systems, technical problems have prevented their full potential from being 

25 realised. MLPs generate complex multi-band DNA fingerprints from Southern Wots of human genomic DNA which have 
proved to be very effective in determining family relationships (Jeffreys et al., 1985c, 1991a). However, these probes 
have proved less useful in forensic investigations due to the relative lack of probe sensitivity, difficulties in comparing 
DNA fingerprints between blots and major problems in converting the complex patterns into a form appropriate for com- 
puter databasing (see Jeffreys et al., 1991b). These problems have been largely overcome using the 1-2 band DNA 

30 profiles generated by Southern blot analysis with SLPs. but other limitations remain. First, allele lengths at hypervaria- 
ble loci can vary in a quasi-continuous fashion in human populations, making unequivocal allele identification impossi- 
ble (Baird et al.. 1986; Wong et al., 1987; Balazs et al., 1989; Odelberg et al.. 1989; Smith et al.. 1990). In addition, 
variation in electrophoretic mobility between DNA samples will introduce errors in allele length estimates; such "band- 
shifts" can occasionally lead to apparent exclusions between the DNA profiles of a forensic specimen and a criminal. 

35 which can, in general, only be evaluated using empirical statistical information on the magnitude of such sizing errors 
generated from extensive validation surveys (Lander, 19898; Budowie et al., 1991). More seriously, error-prone allele 
size estimates impede the comparison of DNA profile evidence gathered from different Southern Wots, greatly weaken- 
ing the statistical power of population and criminal DNA profile databases, and preventing the unambiguous compari- 
son of DNA profie evidence between different forensic laboratories during the course of a criminal investigation. 

40 Some PCR-based DNA typing systems can in principle circumvent these problems of error-prone allele sizing. 
Thus microsatellrtes and other simple tandem repeat loci generate short PCR-amplifiable alleles which should be clas- 
sifiable with precision by sizing on DNA sequencing gels against an appropriate sequencing ladder (L/tt and Luty, 1989; 
Weber and May. 1989). However, most of the microsatellite loci, and particularly those based on dinucleotide repeats, 
show complex multi-band patterns per allele on DNA sequencing gels which appear to arise through Taq polymerase 

45 slippage at dinucleotide repeats during amplification (Weber and May, 1989) and through non-templated nucleotide 
addition catalysed by Taq polymerase (Clark, 1988). As a result, it is sometimes difficult to determine with confidence 
the true size of a given allele. More seriously, the level of allelic variability at microsatellites is very poor compared with 
the most variable minisatellites; the most informative CA repeat locus identified to date shows only 12 different length 
alleles (Utt and Luty, 1989). allowing the classification of individuals into only 78 distinct genotypes. This problem can- 
so not be overcome by amplifying hypervariable minisatellites, since the most variable loci tend to have large alleles (>5kb) 
which are, in general, refractory to PCR amplification (Jeffreys et al., 1988). 

In addition to these technical problems, there has also been considerable debate over the statistical evaluation of 
the population frequency of single locus DNA phenotype evidence (see for examples Lander. 1989; Devlin et al.. 1990; 
Budowie et al., 1991). The general approach is to deduce, conservatively, appropriate allele frequencies (allowing for 

55 allele sizing uncertainties) in a reference population database, and then to deduce genotype frequencies from allele fre- 
quencies under the assumption that the population is at Hardy-Weinberg equilibrium. While most tests have failed to 
reveal major apparent departures from Hardy-Weinberg equilibrium (Odelberg etal., 1989; Devlin etal., 1990; Budowie 
et al., 1991 ; Chakraborty et al., 1991), the tests are relatively insensitive, particularly for rare genotypes with minimal or 
zero representation in the population database. An alternative and more satisfactory approach would be to compare 
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evidentiary DNA phenotypes with very large databases erf phenotypes gathered from population surveys and casework, 
to determine match frequencies based on the frequency of observed phenotypes. Such an approach requires a system 
capable of generating very large numbers of different and unambiguous DNA phenotypes. 

Minisateilrte afief es frequently vary not only in repeat copy number but also in the interspersion pattern of variant 

5 repeat units along alleles (Fig. 1 A) (Owerbach and Aagaard, 1984; Jeffreys et al.. 1985; Jar man et aJ.. 1986; Wong et 
al., 1986, 1987; Nakamura et al., 1987a; Page et al., 1987; Gray and Jeffreys, 1991). We have previously investigated 
variation in allelic mintsateilite variant repeat (MVR) maps at the hypervariable locus D1S8 (probe MS32 - claimed in 
our UK Patent 2188323: Jeffreys et al., 1990). Alleles at this locus show two classes of repeat unit (a-type, t-type) which 
differ by a single base substitution which creates or destroys a Haelll restriction site. The interspersion pattern of 

w Hae lir and Hae lll" repeat units along an MS32 allele can be assayed by amplifying the entire allele, using amplimers 
from the DNA flanking the minisateilrte. followed by end-labelling the amplified allele, partial digestion with Hae lll. and 
electrophoresis to display a ladder of labelled digest products extending from one of the flanking primer sites to each of 
the Haelll-cJeavable repeat units. This approach provides an unambiguous binary code for an allele, and has revealed 
very high levels of allelic variation in MS32 MVR maps, significantly g/eater than can be achieved by conventional 

15 Southern bkrt analysis of human genomic DNA. Curiously, there is a polarity of variation along MS32 alleles; at one end, 
there are relatively few distinct internal maps (haplotypes) in Caucasian populations, whereas the other end of alleles 
show far higher variability, suggesting a local mutational hot-spot responstole for altering allelic repeat unit copy number 
and reshuffling the pattern of variant repeat units (Jeffreys et al., 1990). However, the above MVR mapping method has 
proved to be curnbersome and can only be applied to MS32 alleles small enough (<5kb) to amplify by PCR (Jeffreys et 

20 al.,1990). 

It is therefore desirable to provide a further method of characterising a sample of genomic DNA which overcomes, 
at least in part, the above mentioned disadvantages. 

According to a U st aspect of the present invention we provide a method of characterising a test sample of genomic 
DNA which method comprises amplifying a tandemly repeated region, comprising more than one type of repeat unit, 

25 as far as internal repeat units of a specific type so as to generate a set of amplification products which identify the rel- 
ative positions of the said internal repeat units within the tandemly repeated region, and separating the set of amplifi- 
cation products to provide a sample code. 

The set of amplification products is conveniently produced by contacting the test sample of genomic DNA with type 
specific primer to prime selectively internal repeat units of a specific type, extending the said primers in the presence of 

30 appropriate nucleoside triphosphates and an agent for polymerisation thereof to produce a set of amplification products 
extending from the internal repeat units of a specific type to at least the end of the tandemly repeated region. 

The type specific primer is an oligonucleotide prepared either by synthetic methods or derived from a naturally 
occurring sequence, which is capable of acting as a point of initiation of synthesis when placed under conditions in 
which synthesis of a primer extension product which is complementary to a nucleic acid strand is induced, ie. in the 

35 presence of appropriate nucleoside triphosphates and an agent for polymerisation in an appropriate buffer and at a suit- 
able temperature. In our European Patent. Publication No. 0332435. the contents of which are incorporated herein by 
reference, we disclose and claim a method for the selective amplification of template sequences which differ by as little 
as one base as well as type specific primers for use in the selective amplification method. Type specific primers for use 
in the present invention may therefore be designed with reference to our above mentioned European Patent Applica- 

40 tion, Publication No. 0332435. The selective amplification method is now commonly referred to as the Amplification 
Refractory Mutation System (ARMS). ARMS is a trade mark of ICI pic. 

The type specific primer conveniently includes a tail sequence which tail sequence does not hybridise to the tan- 
demly repeated region or to an adjacent region. By "an adjacent region" we mean a region sufficiently close to the tan- 
demly repeated region to act as template for primer extension which could adversely interfere with the method of the 

45 invention. 

The set of amplification products produced as above and which extends to a common locus flanking the tandemly 
repeated region conveniently acts as template for a common primer which hybridises to the common focus and is 
extended in the presence of appropriate nucleoside triphosphates and an agent for polymerisation thereof to amplify 
the said set of amplification products. The above amplification procedures may be repeated as required. However 

so amplification products may shorten progressively at each amplification cycle, due to the type specific primer priming 
internally on amplification products from previous cycles. It has been found that this problem may be overcome by use 
of a tail specific primer which hybridises to the complement of the tail sequence in the extension product of the common 
primer and is extended in the presence of appropriate nucleoside triphosphates and an agent for polymerisation thereof 
to amplify the common primer amplification products. In summary the tail sequence on the type specific primer is 

55 selected so that its complement in the extension product of the common primer provides a convenient template for the 
tail specific primer provided that the tail sequence and complementary sequences do not hybridise to the tandemly 
repeated region or to an adjacent region. Examples of convenient tail sequence lengths include up to 50. up to 40 , up 
to 30 and up to 20 nucleotides. 
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The sets of amplification products prepared according to the above procedures are conveniently amplified in a 
polymerase chain reaction using the common and tail specific primers as defined above. The polymerase chain reac- 
tion is conveniently descrbed in "PCR Technology edited by Henry A. Ehrtich, published by Stockton Press - Lon- 
don/New York in 1 989. The tail sequence on the type specific primer ensures that the tail specific primer primes internal 

5 repeat units of the desired type at each amplification cycle. 

The method of the present invention is conveniently effected in a single reaction using the type specific and com- 
mon primers in combination with the tail specific primer. H has been found that the ratio of tail specific and/or common 
primer to type specific primer is conveniently more than 1:1. Thus, whilst we do not wish to be limited by theoretical con- 
siderations, at each amplification cycle the amplification products are more likely to be primed by the tail specific primer 

io than by the type specific primer. Any internal priming off amplification products will produce authentic but relatively short 
amplification produces in each amplification cycle. Routine experimentation allows the molecular biologist of ordinary 
skill to provide amplification products extending a desired distance into the tandemly repeated region of choice. Exam- 
ples of convenient ratios of tail specific and/or common primer to type specific primer include at least 20:1 , at least 30:1 
and at least 40:1. preferably at least 50:1 . 

15 The set of amplification products is separated to provide a sample code according to any convenient procedure 
provided that the separation is carried out on the basis of the native (genomic) order of the individual repeat units of a 
specific type within the tandemly repeated region. It will be appreciated that the sample code may be provided from any 
convenient number of amplification products within the set and representing any convenient number of positions within 
the native order. In general, separation is effected on the basis of the relative sizes of the amplification products and 

20 these are conveniently separated via known gel electrophoresis techniques resulting in a ladder of amplification prod- 
ucts representing the sample code. Direct visualisation of the amplification products, for example using staining proce- 
dures, and in particular ethidium bromide, are preferred. If required however the amplification products may be 
identified using a probe which for example hybridises specifically to the tandemly repeated region or to a flanking 
region. The probe may comprise any convenient radioactive label or marker component Preferably a non-radioactive 

25 label such as the triggerable chemiluminescent 1 ,2-dioxetane compound Lumi-Phos 530 disclosed and claimed in US 
patent 4959182 is employed. Lumi-Phos 530 is a registered trade mark of Lumigen Inc. 

The method of the present invention is preferably used to analyse at least two specific types of internal repeat unit 
within the tandemly repeated region. This increases considerably the irrformativeness of the resulting sample code. 
Where amplification is effected using type specific primers this also provides integral control of any mispriming on non- 
30 type specific internal repeat units. Thus for example the amplification products are separated as above to provide two 
or more type specific ladders of amplification products. 

In general the method of the present invention is carried out with reference to one or more controls. In particular 
the method is earned out with reference to a control sample of known profile. Thus for example where the amplification 
products are provided as type specific ladders the positions of the individual "rungs" are compared with the ladder pro- 

35 file for the control sample. The ladder profile for the control sample may also conveniently provide reference positions 
throughout the tandemly repeated region for internal repeat units of a specific type comprised in the sample code. 

The internal repeat units of a specific type and included in the sample code are conveniently of invariant length. 
This simplifies analysis of, for example type specific ladders) of amplification products. 

Specific types of internal repeat units may arise from base substitutions, deletions, translocations or similar events. 

40 Where the internal repeat units are of invariant length this generally arises from base substitution(s) within the repeat 
units. Base substitutions may be detected using any known technique such as direct sequence analysis but are con- 
veniently identified with reference to the presence or absence of restriction sites within the internal repeat units (Jeffreys 
et a!.. 1990). Thus, tor example the MS32 minisatellite claimed in our UK patent no. 2188323 comprises two types of 
repeat unit of invariant length which are Haelll cleavable and Haelll resistant respectively. Therefore in a preferred 

45 aspect of the method of the present invention the tandemly repeated region is comprised in the MS32 minsatellrte and 
two specific types of internal repeat unit arise from the presence or absence of a Haelll site within internal repeat units. 
The type specific primers conveniently comprise the following sequences 3'CGGTCCCCACTGAGT 5' and 3TGGTC- 
CCCACTGAGT 5\ Examples of preferred type specific primers comprising the above sequences are 3'CGGTC- 
CCCACTGAGTCTTAC 5* and 3TGGTCCCCACTGAGTCTTAC 5' respectively. 

50 Where at least two type specific primers are employed their relevant concentration ratio(s) may be selected so as 
to reflect their relevant hybridisation characteristics. Thus for example in respect of the sequences 3 CGGTCCCCACT- 
GAGTCTTAC 5' and 3TGGTCCCCACTGAGTCTTAC 5' a convenient ratio is about 2:1 . Alternatively, equal concentra- 
tions of type specific primers may be used and appropriate mismatches introduced elsewhere in the primer sequence 
(see for example our European patent, publication no. 0332435). 

55 As explained earlier above, a large number of tandemly repeated regions have now been reported in the literature. 
The skilled man is able to determine whether a given region is suitable for use in the method of the invention by any 
convenient analysis of the internal repeat unit structure, for example by direct sequencing techniques. In general, min- 
isatellrte regions will be selected for analysis since most of the microsatellite loci, and particularly those based on dinu- 
cleotide repeats, show complex multi-band patterns per allele on DNA sequencing gels which appear to arise through 
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Taq polymerase slippage at cfinudeotide repeals during amplification (Weber and May, 1989) and through non-tern- 
plated nucleotide addition catalysed by Taq polymerase (Clark. 1988). As a result, it is sometimes difficult to determine 
with confidence the true size of a given allele at microsatetlites. 

To date we have identified convenient tandemly repeated regions comprised in the rrnnisateltrte regions MS31, 
5 MS32. and MS1. These minisateBite regions are claimed in our UK patent no 2188323 and corresponding worldwide 
patent applications. A particular tandemry repeated region is MS32. 

The tandemly repeated region is, in general, identified either by unique internal repeat sequence or by unique flank- 
ing sequence. Alternatively, and less preferably, the region of interest is isolated from the sample mixture using known 
separation procedures, for example, by sample digestion and use of a single locus probe specific for a region at a rJs- 
io tance from the region of interest. 

A significant advantage of our claimed method is that the genomic DNA sample to be tested does not require any 
elaborate pre-treatment Thus the DNA sample may comprise total genomic DNA, including mrtochondrial DNA, and 
the analysis of both maternal and paternal alleles of the selected tandemly repeat region(s) can be readily carried out. 

By "genomic DNA" we mean nucleic acid, such as DNA, from any convenient animal or plant species, such as 
75 humans, cattle, and horses, especially humans. Known DNA typing procedures have already been effected on a wide 
variety of species. The minisateflrte regions MS1 . MS31 and MS32 have proved to be human specific and accordingly 
are not believed to be suitable for the characterisation of non-human samples. 

Where a cxxrrnon primer is used this may hybridise to any convenient focus flanking the tandemly repeated region 
provided that informative amplification products are obtained. In general, the common primer is selected so that the 
20 resulting set(s) of amplification products may be conveniently separated according to size by gel electrophoresis. In 
respect of the MS32 mintsatellrte we have previously disclosed (Jeffreys et al. 1 988 - European patent application, pub- 
lication no. 0370719/ page 18 and Figure 1 1) over 300 bases of 3' flanking sequence and provided examples of con- 
venient primers. 

The flanking locus is advantageously polymorphic since the test sample may be further characterised with respect 
25 to any informative sequence polymorphism at this locus. By Informative sequence polymorphism" we mean any 
sequence polyrnorphism which provides a useful degree of information within a population to be analysed. Convenient 
rx>lymorphisms are in general detected in about 1% - 50% of a given population, such as in up to 2%, up to 5% or up 
to 10% of individuals. 

Amplification of a selected sequence variant of the common locus is conveniently effected using a type specific 

30 common primer in a manner directly analogous to the repeat unit type specific primers of the present invention. Thus, 
the type specific common primer is extended in the presence of appropriate nucleoside triphosphates and an agent for 
polymerisation thereof to amplify a set of amplification products comprising the selected sequence variant. The type 
specific common primers are conveniently designed and produced as described earlier above with reference to the type 
specific primers and our European patent, publication no. 0332435. 

35 The above aspect of the invention may advantageously be used to characterise the test sample of genomic DNA 
in respect of either or both maternal and paternal alleles without prior separation of the alleles. By way of example, sam- 
ple DNA from an individual who is heterozygous for a selected variant of the common locus will only give rise to type 
specific common primer amplification products from one allele. Similarly, sample DNA from an individual who does not 
possess the selected variant will not give rise to any common primer amplification products. Any such results may be 

40 convert entry verified by using a non-type specific common primer at the same common locus to provide amplification 
products for both alleles. In general, for routine characterisation purposes a non-type specific common primer will be 
employed to obtain information from both alleles. 

The preceding aspects of the method of the invention using sequence variants of the common primer focus to effect 
allele "knockout" are based on the unexpected discovery that primer 32D (Jeffreys et al, 1988 - Figure 1 1) hybridises to 

45 a region comprising a rjofymorphic site. Accordingly, in a convenient aspect the tandemly repeated region is comprised 
in MS32 and the polymorphic site in the flanking region is comprised in the locus to which 5' CGACTCGCAGATGGAG- 
CAATGGCC 3' (primer 32D) hybridises. Convenient type specific common primers for this focus comprise the 
sequence 5' GCAGATGGAGCAATG 3' such as 5' CGACTCGCAGATGGAGCAATG 3' (primer 32D2). Convenient non- 
type specific common primers for this locus comprise the sequence 5' GCAGATGGAGCAATGGCC 3* such as 5' 

50 CGACTCGCAGATGGAGCAATGGCC 3' (primer 320). 

A further significant advantage is that our claimed method may be carried out using a partially degraded DNA sam- 
ple. The only requirement is that at least a part of the tandemly repeated region to be analyzed can be amplified to pro- 
vide a sample code. In respect of the MS32 minisatellite we now provide a non-type specrf ic common primer (32 0) 5' 
GAGTAGTTTGGTGGGAAGGGTGGT 3' which is particularly useful with partially degraded samples since it hybridises 

55 to a region directly adjacent to the tandemly repeated region. 

In a further aspect of the method of the present invention two or more sets of differentially labelled amplification 
products are prepared simultaneously. Convenient labels include specific binding substances such as biotin/avidin and 
also immunogenic specific binrjng substances. Further convenient labels include chromophobes and/or fluorophores 
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such as fluorescein and/or rhodamine. In general the relevant primers are labelled although other methods of providing 
labeled amplification products are not excluded. 

In any relevant preceding aspect of the present invention different type specific primers may comprise different tail 
sequences to fact&tate separation of the ampW cation products. 
5 The method of the present invention may be used to characterise more than one tandemty repeated region, for 

example by simultaneous amplification using appropriate type specific primers. This enables a more detailed sample 
code to be obtained. Thus, by way of example the rrrinisateJIite regions MS1, MS31 and MS32 may be amplified at the 
same time. 

As mentioned earlier above a significant advantage of the method of the present invention is that it provides a sam- 
10 pie code individual to the genomic DNA sample. Depending on the procedure used to separate the set(s) of amplifica- 
tion products the sample code may already be in machine readable form, for example suitable for scanning and digital 
encoring. 

Therefore according to a further aspect of the present invention we provide an individual sample code prepared 
according to any preceding aspect of the present invention. The sample code may be based on any convenient number 
is of coding states such as at least 2, for example at least 3, at least 4. at least 5, at least 6, at least 7. at least 8 or at least 
9 coding states. All the above are independent and convenient numbers of coding states In general where the method 
of the present invention is performed using two type specific primers on total genomic DNA the profile of both alleles 
can be conveniently represented as a ternary code for each internal repeat position ie. both type a, both type t or het- 
erozygous type at 

20 In respect of the MS32 minisatellite we have identified a further distinct repeat type O ("null*) which is not primed 
by either type specific primer a or type specific primer t This further repeat type allows the sample code to be expanded 
by three further coding states aO, tO, and OO. Runs of apparent "null" repeats can arise in the coding ladder beyond 
the end of short alleles; such non-existent "null" repeats can be reliably identified by MVR-PCR both of separated alle- 
les and in total genomic DNA. Null or O type repeats can also arise from additional repeat unit sequence variants within 

25 alleles which differ enough to prevent priming by either a -type or t-type repeat primers. 1 .6% of repeat units within Cau- 
casian alleles are O-type repeats and can be accurately identified in separated alleles by the absence of specific inter- 
nal rungs on the MVR ladder. However, their detection in total genomic DNA requires correct interpretation of MVR- 
PCR band intensities (dosage). While the correct discrimination of heterozygous null positions (for example, 
homozygous a/a versus heterozygous a/O) has very little effect on the power of digital codes in individual identification. 

30 correct identification of heterozygous null codes is important when using diploid codes for parentage analysis. For 
example, suppose that, at a given repeat position, the mother is t/t, the father a/O and the child t/O; mis-scoring of the 
father as a/a or the child as t/t would lead to a false paternal exclusion . exactly analogous to the problems created by 
null alleles at classical marker systems. 

We also provide a database which comprises a multiplicity of individual sample codes prepared as above. The indi- 

35 vidual sample codes are preferably derived from a tandemly repeated region within the MS32 minisatellite. Whilst we 
do not wish to be limited by any theoretical considerations we have found that variation within the MS32 minisatellite 
appears to be associated with a clustering of mutation events at the 3' end of the minisatellite. Therefore the individual 
sample code is preferably derived from at least 5 or at least 10 internal repeat units at the 3' end of the minisatellite, 
more preferably from at least 15 internal repeat units at the 3' end of the minisatellite. By "at the 3* end of the minisat- 

40 eflrte" we mean within 100 internal repeat units, such as within 50, 40. 30, or 20 repeat units of the 3* end of the minis- 
atellite. The 3* end of the minisatellrte is defined with reference to 5'-3' extension of the type specific primers 
3'CGGTCCCCACTGAGTCTTAC 5' and 3TGGTCCCCACTGAGTCTTAC 5* and the MS32 3' flanking sequence dis- 
closed, for example on page 18 of our European patent application, publication no. 0370719. 

The above database may be established and used for any convenient characterisation purposes, such as in the 

45 identification of individuals and the determination of individual relationships. 

The present invention also provides a type specific primer to prime selectively internal repeat units of a specific type 
within a tandemly repeated region comprised in any one of MS1 , MS32 or MS32. Convenient type specific primers of 
the present invention include those used where the tandemly repeated region is comprised in MS32 and the specific 
type of internal repeat unit arise from the presence or absence of a Haelll site in the repeat units, such as a set of two 

so type specific primers which comprise the following sequences 3'CGGTCCCCACTGAGT 5' and 3TGGTCCCCACT- 
GAGT 5' respectively, such as 3'CGGTCCCCACTGAGTCTTAC 5' and 3'TGGTCCCCACTGAGTCTTAC 5\ In respect 
of the MS31A minisatellite convenient type specific primers comprise the following sequences AGGTGGAGGGTGTCT- 
GTGA and GGGTGGAGGGTGTCTGTGA. 

In a further aspect of the present invention we provide a test kit which comprises at least two complementary type 

55 specific primer(s) as defined above together with optional common primer as defined above and/or optional tail specific 
primer as defined above, the test kit further inclucfing appropriate buffer, packaging and instructions for use. The test kit 
conveniently further comprises appropriate nucleoside triphosphates and/or an agent for polymerisation thereof. Addi- 
tional optional items for inclusion in the test kit include control DNA of known prof ie, an optionally labelled probe for the 
tandemly repeated region and a probe detection system. 



6 



EP0731 177 A2 



The invention also relates to a computer when prog/ammed to record individual sample codes as defined above. 
Further independent aspects of the present invention relate to a computer when programmed to search for similarities 
between individual sample codes as defined above and to a computer when programmed to interrogate a database as 
defined above. 

5 The term "tandem repeat" is used herein to refer to at least 2 repeats of a sequence comprising at least one 

sequence polymorphism in a given population. In general the tandemJy repeated region used in the method of the 
present invention comprises at least 5. or at least 10, or at least 15 tandem repeats, such as at least 20 or at least 30. 
40. 50 or at least 100 tandem repeats. 

MVR diploid codes offer many major advantages over currently used DNA typing systems that involve length meas- 

10 uremerrts of VNTR (minisatellite or microsatellite) alleles, including the tofiowing: 

1. MVR coding does not involve error-prone fragment length measurement, and provides for the first time digital 
typing information ideally suited to computer databasing. 

2. The MVR-PCR profiles include major informational redundancy useful for confirming code authenticity. 
15 3. Code generation does not require standarrJzatkxi of electrophoretic systems. 

4. Laboratories using MVR-PCR can readily check the authenticity of their codes by including a standard individual 
of known code, preferably an individual containing examples of the scarce codes 4{aO). 5(tO) and 6(00). 

5. Criteria for declaring a match betwen a forensic sample and a criminal suspect are no longer subjective, since 
samples match if the MVR codes are indistinguishable. 

20 6. Side-by-side comparisons of DNA samples on the same gel are no longer necessary. This will enable forensic 
laboratories to segregate forensic sample away from suspects, minimizing the risk of sample mix-up. 
7. MVR-PCR is capable of generating information from degraded DNA, trace amounts of DNA and, in some circum- 
stances, mixed DNA samples. MVR coding is also technically simple and should therefore be suitable for routine 
forensic investigations. 

25 8. All participating laboratories can contribute code data to generate very large communal population and investi- 
gative databases. Given the fact that current estimates suggest in excess of 6000 diferent MS32 alleles, and there- 
fore >2 x 10 7 diploid codes, it is likely that very large databases can be constructed before any significant saturation 
of MS32 MVR code types occurs. (Note that this will be true for unrelated individuals but not for siblings, since a 
pair of stolings will have an approximately 1/4 chance of sharing the same parental alleles and therefore MVR 

30 code). 

9. Very large communal databases provide a simple method for determining the statistical significance of a match 
between a forensic sample and a suspect simply by determining the frequency (probably zero) of the particular 
MVR code in the communal database. The evidence presented, for example in a court is therefore reduced to a 
simple statement that the code in the forensic sample and suspect match, and that the code has not been seen in 

35 x other individuals typed from the appropriate ethnic group. This approach uses phenotypic frequencies rather than 
genotype frequencies deduced under assumptions of Hardy-Weinberg equilibrium. Cumulative typing of isolated 
communities can also be used to determine whether MVR code matching frequencies can be significantly per- 
turbed by inbreeding. 

10. MVR codes also provide a method for parentage testing, where again the statistical evaluation of the signrfi- 
40 cance of a paternal match can be simply estimated by determining the proportion of individuals in the appropriate 

communal database who are not excluded as non-fathers. 

1 1 . All paternity cases where parentage is established automatically yield haplotype data on the four parental alle- 
les which can be accumulated within a very large communal allele database, useful for defining allele diversity and 
frequencies more precisely. 

45 

The term "set of amplification products" is used herein to refer to a plurality of amplification products which identify 
the relative positions of internal repeat units of a specific type within the tandemly repeated region. Any convenient 
number of amplification products are comprised in the set such as at least 2, at least 5, at least 10, at least 15, at least 
20. or at least 30 amplification products. 

so The term "more than one type of repeat unit" is used herein to refer to types of internal repeat units within the tan- 
demly repeated region which may be distinguished according to an informative sequence variation. By way of example 
the presence or absence of a particular restriction site in a repeat unit provides two types of repeat unit ie. a first type 
of repeat unit which comprises the particular restriction site and a second type which which does not comprise the par- 
ticular restriction site. Accorc5ngJy the first and second types of repeat unit are "internal repeat units of a specific type". 

55 It will be understood that a further informative sequence variation between internal repeat units provides further types 
of repeat unit and allows further and independent characterisation of the tandemly repeated region. 

The term "informative sequence variation" is used herein to indicate sequence variation which provides a useful 
degree of information within a population to be analysed. 
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The term "nucleoside triphosphate" rs used herein to refer to nucleosides present in either DNA or RNA and thus 
includes nucleosides which incorporate adenine, cytosine, guanine, thymine and uracil as base, the sugar moiety being 
deoxyrfcose or rbosa In general deoxyribonudeosides will be employed in combination with a DNA polymerase. It will 
be appreciated however that other modified bases capable of base pairing with one of the conventional bases adenine. 
5 cytosine, guanine, thymine and uracil may be employed. Such modified bases include for example 8-azaguanine and 
hypaxanthine- 

The term "nucleotide" as used herein can refer to nucleotides present in either DNA or RNA and thus includes 
nucleotides which incorporate adenine, cytosine, guanine, thymine and uracil as base, the sugar moiety being deoxyri- 
bose or ribose. It wi8 be appreciated however that other modified bases capable of base pairing with one of the conven- 

10 tkxiaJ bases, adenine, cytosine, guanine, thymine and uracil, may be used in the diagnostic primer and amplification 
primer employed in the present invention. Such modified bases include for example 8-azaguanine and hypoxarrthine. 

In addition, it will be understood that references to nucleotide^), oligonucleotide^) and the like include analogous 
species wherein the sugar-phosphate backbone is modified and/or replaced, provided that its hybridisation properties 
are not destroyed. By way of example the backbone may be replaced by an equivalent synthetic peptide. 

is As outlined above the method of the present invention is particularly applicable to MS32 alleles of any length and 
also applicable to total genomic DNA to display the superimposed MVR maps of both alleles, thereby generating a ter- 
nary, rather than binary code (Fig. 1 A). The approach is outlined in Fig. 1B.C and uses two MVR-type specific prirn- 
ers/amplimers. Each amplimer consists of 20 nucleotides of MS32 repeat unit terminating at the Hae lll w " variable site 
and differing at the 3* end such that one amplimer may only prime off a-type repeat units and the other amplimer only 

20 off t-type repeats. Amplification using one or other MVR-type specific primer together with amplimer 32D from the min- 
isatellite flanking DNA will generate two complementary sets of products from the uttravariabie end of a given MS32 
allele, from which the MVR maps can be deduced. However, PCR products may progressively shorten at each PCR 
cycle, due to the MVR-specific amplimer priming internally within PCR products, to generate eventually a set of PCR 
products extending from the 32D flanking site to, at most, the first few repeat units. To prevent any such collapse, each 

25 MVR-specific primer preferably carries an identical 20nt 5' extension "TAG" to create oligonucleotides 32-TAG-A and 
32-TAG-T. Duplicate PCR amplifications are carried out with a very low concentration of 32-TAG-A or 32-TAG-T plus 
high concentrations of 32D and the TAG sequence itself. At each cycle of PCR. each MVR-specific primer will prime 
from one of its complementary repeat units within the minisatellite and extend into the flanking DNA past the 32D prim- 
ing site. At the next PCR cycle, 32D will prime on the first product and extend back across the minisatellite, terminating 

30 at the TAG sequence and creating a sequence complementary to TAG from which the TAG primer can now prime. At 
the next PCR cycle, this second PCR product can now amplify, and is much more likely to be amplified by 32D and TAG. 
rather than 32D and the MVR-specific primer, since TAG is present at high concentration. As a result, a stable set of 
PCR products will be generated, extending from the 32D priming site to each a-type or t-type repeat unit, depending on 
the MVR-specific primer used. Any internal priming off PCR products by 32-TAG-A or 32-TAG-T may create authentic 

35 but relatively short PCR products in each reaction. By adjusting the concentration of 32-TAG-A and 32-TAG-T relative 
to 32D and TAG, it is posstole to create sets of PCR products extending at least 80 repeat units (2.3 kb) into the minis- 
atellite. 

As explained earlier above the use of a tail (or TAG) specific primer which hybridises to the complement of the tail 
sequence in the extension product of the common primer prevents internal priming within the tandemty repeated region 

40 and subsequent shortening of the amplification products at each amplification cycle. We now disclose that the above 
principle may be applied to any convenient detection method involving amplification by primer extension. In known pro- 
cedures comprising a polymerase chain reaction mispriming can occur at each amplification cycle, especially where the 
primer is used to detect for example single base mismatches or to detect a particular sequence against a background 
of related sequences. Such mispriming may only occur as a very low percentage of total priming events per amplrfica- 

45 tkxi cycle but will increase sicpirficantly as a function of the overaJ I number of cycles. The present invention now provides 
a two stage procedure wherein as a first stage the initial diagnostic interaction between a primer comprising a tail 
sequence and a sample template may conducted at optimum hybridisation stringency. Any primer extension products 
are then amplified using a further primer. As a second stage the above extension products are then amplified using a 
tail specific primer and the further primer. Accordingly, whilst mispriming may still occur the overall level may be signrf- 

50 icantly reduced. 

Therefore, according to a further and independent aspect of the present invention we provide a novel method for 
detecting the presence or absence of at least one diagnostic base sequence in one or more nucleic acids contained in 
a sample, which method comprises contacting the sample with a diagnostic primer for each diagnostic base sequence 
in the sample nucleic acid, the nucleotide sequence of each diagnostic primer being such that it is substantially com- 
55 plementary to the corresponding diagnostic base sequence, under hybridising conditions and in the presence of appro- 
priate nucleoside triphosphates and an agent for polymerisation thereof, such that an extension product of a diagnostic 
primer is synthesised when the corresponding diagnostic base sequence is present in the sample, no extension product 
being synthesised when the corresponding diagnostic base sequence is not present in the sample and any extension 
product of a diagnostic primer acts as template for extension of a further primer which hybridises to a locus at a distance 
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from the relevant diagnostic base sequence, and wherein at least one of the diagnostic primer(s) further comprises a 
tail sequence which does not hybridise to a diagnostic base sequence or a region adjacent thereto, and contacting the 
above mixture with a tail primer which hybridises to the complement of the tail sequence in an extension product of the 
further primer and is extended in the presence of appropriate nucleoside triphosphates and an agent tor polymerisation 
5 thereof to amplify the further primer amplification products whereby the presence or absence of the diagnostic base 
sequencers) in the nucleic acid sample is detected from the presence or absence of tail specific primer extension prod- 
uct 

The above method is of particular use, tor example, where diagnostic base sequencers) are only present in low 
concentration in complex nucleic acid mixtures. 

10 In our European Patent. Publication No. 0332435, the contents of which are incorporated herein by reference, we 
disclose and claim a method for the selective amplification of template sequences which differ by as little as one base. 
The above method is now commonly referred to as the Amplification Refractory Mutation System (ARMS). 

Therefore in a preferred aspect of the above detection method a terminal nucleotide of at least one diagnostic 
primer is either complementary to a suspected variant nucleotide or to the corresponding normal nucleotide, such that 

15 an extension product of a diagnostic primer is synthesised when the terminal nucleotide of the diagnostic primer is com- 
plementary to the corresponding nucleotide in the diagnostic base sequence, no extension product being synthesised 
when the terminal nucleotide of the diagnostic primer is not complementary to the correspond ng nucleotide in the diag- 
nostic base sequence. 

The diagnostic primers for use in the preceding aspect are conveniently designed with reference to our above men- 

20 toned European Patent. Publication No. 0332435. 

By "substantially complementary" we mean that primer sequence need not reflect the exact sequence of the tem- 
plate provided that under hybridising conditions the primers are capable of furfflling their stated purpose. In general, 
mismatched bases are introduced into the primer sequence to provide altered hybridisation stringencies. Commonly, 
however, the primers have exact complementarity except in so far as rxxvcornpl ementary nucleotides may be present 

25 at a predetermined primer terminus as hereinbefore described. 

In the diagnosis of, for example, cancer the situation may arise whereby it is desirable to identify a small population 
of variant cells in a background of normal cells. The ARMS system is well suited for this purpose since it discriminates 
between normal and variant sequences even where the variant sequence comprises a very small fraction of the total 
DNA. Whilst we do not wish to be limited by theoretical considerations we have successfully performed ARMS assays 

30 in which the ratio of mutant to normal DNA was 1 100 and we believe that even larger ratios may be readily used. To 
optimise the sensitivity of the ARMS reaction it may be performed in isolation ie. with a single ARMS primer since in 
duplex or multiplex reactions there may be competitive interaction between the individual reactions resulting in a toss of 
sensitivity. A control reaction is desirable to ensure that a polymerase chain reaction has taken place. In a test for an 
inherited mutation the copy number of the mutation and other genomic is typically 1:1 or 1:2, so a genomic control reac- 

35 tton can be used without compromising sensitivity or creating an imbalance in the system. In a cancer test however, the 
use of a genomic control reaction may swamp the test reaction leading to a toss of sensitivity. We have now found that 
ARMS primers) comprising tail sequences may advantageously be used in a two stage amplification procedure com- 
prising a genomic control reaction. In the first stage ARMS primer(s) comprising non-complementary tail(s) are used to 
amplify any variant sequence which may be present In addition to the ARMS reaction a genornic control reaction is per- 

40 formed in the same reaction vessel using primers at very low concentatton. The control reaction primers also have non- 
homologous tails which may or may not have the same sequence as the ARMS primer tail(s). In the second stage tail 
specific primers are added and the temperature increased to prevent the original genomic control primers from func- 
tioning. In this second stage any variant sequence product is further amplified and the product of the control reaction 
from the first stage is also amplified to give a detectable product Thus the ARMS reaction will only take place if variant 

45 sequence is present in the original sample and the control reaction will only function if both the first and second stage 
amplification reactions have worked. 

A further and important use of ARMS is for detecting the presence or absence of more than one suspected variant 
nucleotide in the same sample. The ability of ARMS to selectively amplify sequences depending on the predetermined 
nucleotide sequence of the diagnostic primers enables multiple amplification products to be distinguished simply, accu- 

50 rately and with minimal operator skill thus making it possible to provide a robust technique for screening a single sample 
for multiple nucleotide variations. The use of ARMS to detect more than one suspected variant nucleotide in the same 
sample is conveniently referred to as multiplex ARMS. Multiplex ARMS is thus of particular interest in screening a single 
sample of DNA or RNA for a battery of inherited conditions such as genetic disorders, predispositions and somatic 
mutations leading to various diseases. Such DNA or RNA may for example be extracted from blood or tissue material 

55 such as chorionic villi or amniotic cells by a variety of techniques such as those described by Maniatis et ai Molecular 
Cloning (1982), 280-281 . Morever as the molecular basis for further inherited conditions becomes known these further 
conditions may simply be included in the screening technique of the present invention. 
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Multiple amplication products may be Distinguished by a variety of techniques. Thus for example probes may be 
employed for each suspected amplified product, each probe carrying a different and distinguishable signal or residue 
capable of producing a signal. 

A much simpler and preferred method of distinguishing between ARMS amplification products comprises selecting 
5 the nucleotide sequences of the amplification primers such that the length of each amplified product formed during the 
process of the present invention is different In this regard the number of base pairs present in an amplification product 
is dictated by the distance apart of the diagnostic and amplification primers. Thus the amplification primers may be 
designed such that each potential variant nucleotide is associated with a potential amplification product of different 
length. 

10 In an ARMS reaction diagnostic for a particular point mutation the sequence of the primers is largely constrained 
by the sequence of the DNA adjacent the mutation of interest The 3' base of the primer usually matches the base 
altered by the mutation and extra destabifisatkxi is introduced to gjve the required level of specificity. The term "specif- 
icity" refers to the ratio of the yield of product when an ARMS primer is used to prime its target sequence compared to 
the yield of mis-primed product from the non-target sequence. 

is In a multiplex ARMS reaction it is desirable that the individual ARMS reactions work with similar efficiency to allow 
the simultaneous detection of all the reaction products. This may be achieved for example by altering the concentration 
of the primers, alteration of the rujmber/composrtion of reactions, or alteration of the ammount of additional destabilisa- 
tions introduced into the ARMS primers. Whilst these methods are normally sufficient to obtain a balanced multiplex 
ARMS reaction the use of tail or tag sequences may have advantages in certain situations. In particular these may allow 

20 a more specific test. By way of example, where a strong additional mis-match is used to obtain specificity the yield of 
corresponding multiplex product may be low. Reducing the additional mis-match strength may not be possible without 
cx>mprornising specificity. A tail sequence which in combination with a tail specific primer provides a good substrate for 
a DNA polymerase may be used to balance the multiplex reaction. A range of tail/primer combinations of known priming 
ability may be provided. Thus by way of example as a first amplification step the rxirning/mis-priming ratio is optimised 

25 without regard to product yield. Product yieW is then balanced in the second amplification step using an appropriate 
range of tail/primer combinations. 

In our UK patent application no. 9201686.4 we disclose and claim that multiplex ARMS may be successfuly per- 
formed where diagnostic primer extension products of more than one diagnostic base sequence of a nucleic acid sam- 
ple comprise a complementary overlap. This unexpected improvement to multiplex ARMS is referred to hereinafter as 

30 over ARMS. Over ARMS now facilitates the detection and analysis of, for example, inherited or infectious disease where 
the potential variant nucleotides are closely spaced. 

Therefore in a further aspect of the claimed detection method the (potential) extension products of at least two diag- 
nostic primers comprise a complementary overlap. The overlap may occur due to any convenient arrangement of the 
diagnostic primers. Thus for example the diagnostic primers may conveniently be opposed as illustrated in Figure 8 (i). 

35 Furthermore we have found that ARMS may be successfully performed where the diagnostic primer(s) for more 
than one diagnostic base sequence in a nucleic acid sample themselves comprise a complementary overlap. 

Therefore in a further aspect of the claimed detection method at least two diagnostic primers themselves comprise 
a complementary overlap. Thus for example the primers are superimposed on the same strand as illustrated in Figure 
8 (iii) or less preferably overlap as illustrated in Figure 8 (ii). More conveniently the primers are nested as illustrated in 

40 Figure 8 (iv). 

In an overARMS reaction the size of the reaction products can be used to identify individual combinations of variant 
nucleotides. Where the products are separated for example on an agarose gel this approach may be limited by the 
resolving power of the gel. By way of example in a high resolution agarose gel overARMS may presently be used to 
identify mutations within about 10-15 bases of each other. The size of the outer overARMS primer was increased to give 

45 a larger product and we surprisingly found that the yield of the smaller overARMS product was significantly reduced. 
Whilst we do not wish to be limited by theoretical considerations we believe that target masking takes place due to the 
increased Tm of the larger overARMS primer which binds preferentially to the target DNA and prevents the smaller 
overARMS primer from hybridising. Use of a tailed outer overARMS primer may provide the increased product size nec- 
essary for resolution but since it is non-complementary at its 5' end the Tm will be similar to the smaller primer. 

so OverARMS is conveniently used for HLA typing, in the diagnosis of p-thalasaemia, sicWe cell anaemia, phenylke- 
tonuria (PKU), Factor VIII and IX blood disorders and a-1 -antitrypsin def icieny. A particular use for OverARMS is in the 
detection and diagnosis of cystic fibrosis. Convenient cystic f brosis alleles are disclosed in our European Patent Appli- 
cation No. 90309420.9; by B. Kerem et aL Science. 1989. 245. 1073-1080; by J.R. Riordan et §L Science. 1989. 245, 
1066-1073; by J.M. Rommens et aL Science. 1989, 245. 1059-1065; by G.R. Cutting etaL Nature. 346. 366-368; by M. 

55 Dean et aL Cell. 61 863-870; by K. Kobayashi et aL Am. J. Hum. Genet., 1990. 47, 61 1-615; by B. Kerem et aL Proc. 
Natl. Acad. Sci. USA. 1990. §Z, 8447; by M. VTdaud et aL Human Genetics. 1990. 85. (4), 446-449; and by M.B. White 
et aL Nature. 344. 665-667. 

Our two stage amplification process using diagnostic and tail primers in combination with a further common primer 
is conveniently carried out using all three primers simultaneously and preferably using a ratio of tail specific and/or fur- 
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ther primef(s) to diagnostic primers) of at least 1 :1 , such as at least 20:1 , at least 30:1 , and at least 40:1 . preferably at 
least 50:1. 

We also provide a Kit for detecting the presence or absence of at least one diagnostic base sequence in one or 
more nucleic acids contained in a sample, which kit comprises a cfiagnostic primer for each diagnostic base sequence. 

5 the nucleotide sequence of each diagnostic primer being such that it is substantially complementary to the correspond- 
ing cfiagnostic base sequence, such that under hybridising conditions and in the presence of appropriate nucleoside tri- 
phosphates and an agent for polymerisation thereof an extension product of each diagnostic primer is synthesised 
when the corresponding diagnostic base sequence is present in the sample, no extension product being synthesised 
when the correspond ng diagnostic base sequence is not present in the sample and wherein at least one of the diag- 

10 nostic primer(s) further comprises a tail sequence which does not hybrkfise to a diagnostic base sequence or a region 
adjacent thereto, together with appropriate buffer, packaging and instructions for use. 
The kit conveniently further comprises at least one of the following items: 

(i) each of four different nucleoside triphosphates 
15 (ii) an agent for polymerisation of the nucleoside triphosphates in (i) 

(iii) tail specific primer(s) 

(iv) a further primer which hybridises to a region at a distance from the diagnostic region(s) to which the cfiagnostic 
primer(s) selectively hybridise. 

20 The kit conveniently comprises a set of two diagnostic primers for each diagnostic portion of a target base 
sequence, a terminal nucleotide of one diagnostic primer being complementary to a suspected variant nucleotide asso- 
ciated with a known genetic disorder and a terminal nucleotide of the other diagnostic primer being complementary to 
the correspond ng normal nucleotide. 

The invention will now be further described by. but not limited to, the following examples, tables and figures wherein: 

25 

Figure 1 shows the principles of minisatellite repeat coding. A. minisatellfte alleles consisting of interspersed 
arrays of two variant repeat units termed a -type (shaded boxes) and t-type (open boxes). Individual alleles can be 
encoded as a binary string extending from the first repeat units. In total genomic DNA, a corresponding ternary 
code of both alleles superimposed can be generated. At each repeat unit position, the alleles can be both a-type 

30 (code 1 ). both t-type (code 2), or heterozygous with one a-type and cone t-type repeat (code 3). B, the consensus 
repeat unit sequence of human mint sat ell it MS32 (D1S8) shewing the polymorphic site which generates Hae lll 
cleavable (a-type) repeats and Hae lll-resistant (t-type) repeat 32-TAG-A and 32-TAG-T are variant repat specific 
oligonucleotides terminating at this polymorphic site. Each primer consists of 20nt minisatellite repeat sequence 
(bold) preceded by a 20nt 5' synthetic non-minisatellite extension identical to the TAG amplimer. C, the principle of 

35 MVR-PCR, illustrated for a single allele amplified using primer 32-TAG-A. 1 . At low concentration of primer, 32- 
TAG-A will anneal to approximately one a-type repeat unit per target minisatellite molecule and extend into the 
flanking DNA. 2. Amplimer 32D primes from the flanking DNA, creating a sequence complementary to TAG. 3. 
These DNA fragments terminating in 32D and the TAG complement can new be amplified using high concentration 
of 32D and TAG amplimers, to create a set of PCR products extending from the flanking 32D site to each a-type 

40 repeat unit. Use of primer 32-TAG-T at stage 1 will create a complementary set of products terminating at each t- 
type repeat unit. 

Figure 2 illustrates examples of minisatellite allele repeat coding by MVR-PCR. MS32 alleles (4.7-1 8.8kb long con- 
taining 138-630 units) were separated from genomic DNA and amplified using 32-TAG-A (A) or 32-TAG-T (T) in the 
presence of high concentration of primers 32D and TAG. PCR products were separated by agarose gel electro- 
ns phoresis and detected by Southern blot hybridization with MS32 minisatellite probe. The first repeat unit (asterisk) 
is weakly detected and cannot be scored reliably. Null repeat units in allele 4 which do not amplify with either 32- 
TAG-A or 32-TAG-T are arrowed. Methods: Human genomic DNA previously typed by Southern blot hybridization 
with MS32 was digested with Mbol. which cleaves outside the minisatellrte. and the two alleles from each individual 
separated by agarose gel electrophoresis and recovered by electroelution. Aliquots of fractionated DNA cone- 
so sponding to 100ng total genomic DNA were amplified in 7^1 of 45mM Tris-HCI (ph 8.8), 1 1mM (NH^SO^ 4.5mM 
MgCI 2 . 6.7mM 2-mercaptoethanol, 4.5fiM EDTA, 1mM dATP, 1mM dCTR 1mM dGTP. 1mM dTTP (Pharmacia), 
1 tOyg/ml bovine serum albumin (DNase free, Pharmacia) plus 1fiM primer 32D, 1u.M primer TAG and either 10nM 
32-TAG-A or 20nM 32-TAG-T in the presence of 0.25 unit AmpliTaq (Perkin-Elmer-Cetus). Reactions were cycled 
for 1.3 min at 96°C, 1 min. at 68°C. 1 min. at 70°C for 18 cycles on a DNA Thermal Cycler (Perkin-Elmer-Cetus), 
55 followed by a chase for 1 min. at 67°C, 10 min at 70°C for 2 cycles. The sequence of the flanking primer 32D is 5'- 
CGACTCGCAGATGGAGCAATG-3* (Jeffreys et al.. Cell. 1990. 60, 473-485). PCR products were electrophoresed 
through a 35cm long 1% agarose (Sigma type I) gel in 89mM Tris-borate (pH8.3), 2mM EDTA, 0.5yg/ml ethidium 
bromide alongside 4>X174 DNA x Hae lll until the 1 10bp marker had reached the end of the gel. DNA was dena- 
tured, transferred by blotting onto Hybond-N (Amersham) and hybridized to ^p-labelled MS32 minisatellite probe 
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for 3 hours at 65°C as described previously (Wong et ah, Ann. Hum. Genet. 1987, 51 269-288). Autoradiography 
was lor 6 hours at room temperature. 

Figure 3 shows the mtmsateJfrte repeat unit compositition of MS32 alleles. A. examples of MVR allele codes gen- 
erated by MVR-PCR and by partial Hae lll cleavage of end-labelled amplified alleles (Jeffreys et al., Cell, 1990. 6& 
473-485). Allele codes are shown in the opposite orientation from that used in Jeffreys et al (1 990). and thus extend 
into the allele from the more variable end of MS32 alleles. The first repeat unit cannot be reliably scored by MVR- 
PCR and is denoted "T Code discrepancies are indicated by Y. AfleJe 1, MVR-PCR and Haelll codes fully con- 
cordant; allele 2, presence of null (O-type) repeat units which cannot be primed by either 32-TAG-A or 32-TAG-T; 
allele 3, example of a short aflele (38 repeat units) showing that MVR-PCR coding extends to the terminal repeat 
unit; allele 4, showing the only example of an a/t discrepancy between MVR-PCR and Hae lll coding. B, repeat unit 
composition of MS32 alleles determined from 32 drfferent Caucasian alleles mapped by MVR-PCR and by Hae lll 
cleavage. The numbers indictaed in the Haelll* and Haelll" columns represent the numbers of repeat units. C. prob- 
able location of additional minisatelitte variant repeats. X, a substitution at the first base of the Hae lll site will 
destroy the site but not prevent priming by 32-TAG-A, generating a repeat unit scored as a -type by MVR-PCR but 
t-type by Haelll (see A4 above). Y, substitution^) in this region might block priming by both 32-TAG-A and 32-TAG- 
T to generate null (O-type) repeats. 

Figure 4 illustrates MVR-PCR on total human genomic DNA. 200ng samples of Caucasian blood DNA from 9 indi- 
viduals were amplified with 32-TAG-A(A) or 32-TAG-T(T) and PCR products detected by Southern blot hybridiza- 
tion with mrnisatellrte probe MS32, as descrbed in Fig. 2 legend. The scale -1-10-20-30-40-50-60- indicates the 
repeat unit code position. Scoring commences at code position 1 (second repeat unit into the array). Note that the 
A and T tracks are completely complementary for individuals 1-5 and 7-9, and yield ternary codes where 1=aa 
(intense band only in A tracks), 2=tt (intense band only in T tracks) and 3=at (relative taint band in both A and T 
tracks). Individual 6 shows a null O-type repeat in one of his alleles, generating an aO (code 4) position (asterisk) 
detected as a relatively faint A track band with no band in the T track. This individual also contains a short allele of 
28 repeat units, as shown by loss of code 1 , 2 and 3 repeats above the arrowed position and the presence of only 
code 4 (aO, taint a) and code 5 (tO, faint t) repeats, equivalent to the separated allele profiles shown in Fig. 2. The 
presence of this short allele was confirmed by conventional Southern blot hybridization analysis of genomic DNA 
with MS32 (data not shown). 

Figure 5 illustrates individual variation in diploid MVR codes. Codes extending for at least 50 repeat units were 
determined for 334 unrelated individuals (1 77 English, 20 French, 48 Mormon, 2 Amish, 4 Venezuelan and 83 Jap- 
anese). A, filled bars, number of code cfifferences seen over the first 50 repeat units, determined for every pairwise 
comparisons in total). Open bars, number of differences after removal of band intensity information, scoring code 
1 (aa) and 4(aO) as identical and code 2(tt) and 5(tO) as indistinguishable. Mean number of differences per pair of 
individuals were 30.1 and 27.9 respectively. The Y axis represents the number of cases and the X axis represents 
the number of differences in the first 50 repeats. B, expanded plot showing the frequency distribution of the most 
similar pairs of MVR codes. All pairwise comparisons showed at least 4 differences over the first 50 repeats. The 
Y axis represents the number of cases and the X axis represents the number of differences in the first 50 repeats. 
C, examples of (i) the most similar and (ii) most dissimilar pairs of individuals over the first 50 repeat units, with dis- 
cordances marked with "x", or ":" for discordances which rely solely on intensity drfferneces. E=English, M=Mormon 
and J=Japanese. The complete code scored per individual is shown, together with additional differences beyond 
repeat unit 50. Methods : All MVR codes were determined as described in Fig. 4. 62±6 repeat units were scored 
per individual. Data were stroed as ASCII files and analysed using software written in VAX BASIC V3.4 and run on 
a VAX 8650 computer operating on VMS 5.3-1 . 

Figure 6 shows the composition of MS32 repeat units along alleles, deterined from the diploid codes of 334 unre- 
lated individuals (668 alleles) (ethnic composition given in Fig. 5 legend). A, frequency of a-, t- and O-type repeat 
units at each position along MS32 alleles. Only O-type repeat units which lay inside alleles were scored, after 
removing all "O-type" repeats corresponding to code positions beyond the end of short alleles. The mean propor- 
tions of a-, t- and O-type repeats (horizontal lines) and 0.721 , 0.265 and 0.0144, averaged over 40,329 repeat units 
scored. The Y axis represents the proportion of repeats and the X axis represents the individual repeat unit posi- 
tions along MS32 alleles. B, thick line, probability at each repeat position that two individuals match, determined 
from all pairwise comparisons of the 344 individuals. Thin line, corresponding probabilities determined after remov- 
ing band intensity assumptions by scoring code 1 (aa) and 4(aO) as indistinguishable, and code 2(tt) and 5(tO) as 
the same. The mean match probabilities per repeat position (dotted lines) are 0.395 and 0.439 respectively. The Y 
axis represents the match probability and the X axis represents the irxfividual repeat unit positions along MS32 alle- 
les. C, test for departure from Hardy-Weinberg equilibrium at each repeat position, after converting each repeat 
position to a dimorphism ("alleles" a and t+O) and determining the X 2 statistic (1 d.f .) for deviation of observed gen- 
otype frequencies (aa. a(t+0), (t+O) (t+O)) from those expected from the repeat unit composition at each repeat 
position. The 5% significance level is given by a dotted line. The Y axis represents chi-square and the X axis rep- 
resents the repeat unit number. 
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Ftaure 7 shows the reconstruction of the MVR codes of irxfividual MS32 alleles by pedigree analysis. 1 . incomplete 
reconstruction of the haplotypes of all tour parental alleles using a single child in an English family (f=father. 
m= mother and c=chtkl The paternal and maternal alleles transmitted to the child are labelled A and C, and the non- 
transmitted alleles B and D. Parental haplotypes can be deduced, assuming no recombination between parental 

5 alleles, at all positions except where the father, mother and child are heterozygous for the same repeat unit types 

(i.e. all indviduals share either code 3{at) or 4(aO) or 5{tO)). Such ambiguities are indicated by "T. 2, complete 
reconstruction of a Dele maps using the large stoship of CEPH family 1 04 (f=father, m= mother, c=children). The chil- 
dren show four different diploid codes, corresponding to the four possible combinations of parental alleles, with no 
allelic mutation in any of the offspring. The mother unusually contains two short alleles, resulting in coding state 

w 6(00) beyond the end of the longer allele. The resulting haplotypes (alleles C and D) terminate in a string of "null" 
(nonexistent) repeats. Both of these alleles have been mapped by partial digestion with Haelll (in bold) giving maps 
fully consistent with those extracted from the dptoid codes. Methods : Haplotypes were extracted using software 
written in VAX BASIC V3.4. Diploid codes of the father, mother and each child were entered. For a family with a 
single child, the four parental haplotypes were extracted sequentially along each position of the diploid code. For 

is each position, the code of the father, mother and child were noted and checked in a look-up table to determine 
whether exclusions exist and if not to determine the repeat unit types transmitted or not from each parent to the 
child. For example, if the father is 1(aa). mother 3(at) and child 3(at), then no exclusions exist and the repeat units 
at that position on each allele is given by allele A, a; allele B. a; allele C. t; allele D, a. Similarly, codes father 3(at) 
+ mother 5(tO) + child 4(aO) give A. a; B, t; C, O; D, t. In contrast, codes 1 (aa) + 2(tt) + 5(tO) g;ve a paternal muta- 

20 tion/exclusion. Since there are 7 possible coding states per irxSvidual (codes 1 -6 plus for repeat positions where 
the scoring is uncertain), the look-up table contains 7 s = 343 entries corresponding to every possible combination 
of codes in the mother, father and child. For families with more than one offspring, the incomplete haplotype of each 
parental allele was extracted from each child as descrbed above for the single child family. The incomplete haplo- 
types from each parent were then compared to identify matching alleles deduced from different children and to 

25 deduce which parental allele had been transmitted to each child. The consensus haplotype of each allele was then 
determined from the incomplete haplotypes deduced from each child, thereby removing all uncertain positions. 
Finally, the diploid code of each individual was compared with the code predicted from the two constituent alleles, 
as a final check to ensure full condor da nee of all diploid codes and haplotypes. The deduced haplotypes were 
stored in an ASCII fie to generate an MS32 allele database. Note that this approach also enables allele reconstruc- 

30 tfon where one parent is missing, by entering the code of the missing parent as "???...". 

Figure 8 shows the frequencies of different MS32 alleles defined by MVR mapping. A, incidence of different alleles 
in a sample of 254 Caucasian MS32 alleles mapped from separated alleles (Fig. 2) or by pedigree analysis (Fig. 
7). N=number of different alleles and T=no of times observed in sample of 254 alleles. The allele database con- 
tained 109 English, 40 French, 95 Mormon, 4 Amtsh and 6 Venezuelan alleles, of which 100 alleles were deduced 

35 from single child families and were therefore incomplete. 63 ± 5 repeats were determined per allele. All pairwise 
combinations of alleles (32, 1 31 comparisons in total) were checked for identity. \ one pair of indistinguishable alle- 
les occurs in the CEPH homozygote 371 0. and another pair are shared by CEPH Amish parents 88401 and 88402. 
B. examples of indistingishable alleles shared by unrelated individuals. Note that the English allele is incompletely 
mapped at positions marked F=French, M= Mormon and E=Engfish. C, examples of similar but non-identical 

40 pairs of alleles, with Differences marked X. Similar pairs of alleles were identified by searching all pairwise compar- 
isons of alleles for pairs with small numbers of differences exist between the MVR codes of a randomly picked pair 
of alleles. In total, 10 examples of groups of 2-4 alleles with closely related in -phase MVR maps were identified 
among the 254 alleles analysed. 

Figure 9 shows the analysis of MS32 mutant alleles detectable in pedigrees. 1. Example of CEPH pedigree show- 
45 ing a child with a mutant allele. Maps of parental alleles A-D were deduced from 7 non-mutant offspring (not 
shown). Comparison of the diploid code of child 141606 with the parents shows 4 specifically paternal exclusions 
(p) plus 3 ambiguous exclusions (e) which do not indicate the parental origin of the mutant allele. There are no 
maternal exclusions, and thus the child has inherited a mutant paternal allele and non-mutant maternal allele. The 
diploid code of the child is compatble with the child having inherited maternal allele C but not D. Subtraction of the 
so code for allele C from the diploid code of the child yields the code for the mutant paternal allele. Comparison of the 
mutant allele with paternal alleles A and B indicates that this allele commences with the code of allele A and then 
switches to the beginning of the code of allele B after two a-type repeats of unknown origin. This allele therefore 
appears to have arisen by unequal crossing over between the two paternal alleles, as indicated, with possible 
cross-over sites marked X. F=father, M=mother. ch=child, m=mutant, mu=d educed mutant allele 2, summary of 
55 mutant alleles detected in the CEPH panel of families, based on the analysis of 286 offspring from large sibships. 
This survey has detected all allele length change mutations previously detected by Southern blot analysis of AM 
digests of genomic DNA (Armour et al., 1989). plus three new hitherto-undetected mutations resulting from gains 
of a single repeat unit. In all cases, the change in repeat unit copy number is consistent with allele length changes 
detected by Southern Wot analysis (not shown). mu=mutation, ma=maternal, pa= paternal, CEPH=CEPH individ- 



13 



EP0 731 177 A2 



ual, (i)=change in repeat unit copy number. (ii)=detected on Southern Blot, (iiO= mechanism and in=intra-allelic. The 
mutation rate =7^72 per gamete =0.0122 (95% confidence limits. 0.006-0.023). 3. possible locations of unequal 
exchange points on the donor allele (the allele which contributes to the beginning of the mutant allele) and the 
recipient allele, as shown for mutant e in (1) above. For presumptive intra-allelic (sister chromatid) unequal 

5 exchange, the donor and recipient alleles are identical. D=donor. ^recipient, mu=mutation and rp=repeat position. 

Figure 10 illustrates MVR-PCR analysis of trace amounts of human genomic DNA. An individual was selected from 
a collection of 450 typed people, and his identity hidden from the analyst. Three pairs of 1 0Opg aliquots of genomic 
DNA from this individual were amplified by MVR-PCR for 28 cycles using 32-TAG-A(A) or 32-TAG-T(T) and the 
PCR products detected by Southern btot hybridization. Zero DNA controls gave no signal (not shown). The incom- 

w plete MVR code shown of the unknown individual was established from all repeat positions which gave concordant 
typing results in all three analyses. Repeat positions which have ambiguous typing results due to band "drop-out- 
were scored as "T, as shown. Note that band intensity fluctuations prevent the discrimination of codes 1(aa) and 
4<aO). and codes 2(tt) and 5(tO). An incomplete MVR code which could be determined over the first 45 repeat 
positions was then compared against all 450 individuals in the database (allowing for equivalence of codes 1 ,4 and 

is codes 2,5). The correct individual was identified as the only database entry which showed a complete match with 
the incomplete MVR code. 

Figure 1 1 illustrates incomplete MVR code information recoverable from mixed DNA samples. Two individuals 
(X,Y) were chosen at random from a collection of 450 typed people, and their identities were concealed from the 
analyst. 100ng samples of genomic DNA from X and Y or from X plus Y mixed in the indicated proportions, were 

20 amplified by MVR-PCR for 18 cycles and PCR products detected by Southern Wot hybrkfization. S. standard indi- 
vidual included on all gels. By compairing the MVR-PCR profile of X with the prof ie of the mixed DNA samples, 
possible genotypes of Y can be deduced, as indicated, at all repeat positions where X is not code 3(at), by checking 
for A- or T-track specific bands present in the mixture but not X. The MVR code of X and the incomplete and ambig- 
uous MVR code of Y deduced from the mixed DNA samples were screened across the database of 450 individuals 

25 to reveal, correctly and uniquely, the identities of X and Y 

Figure 12 shows the efficiency of individual identification by MVR-PCR analysis of mixed DNA samples. The 
ambiguous MVR code of an "assailant" was deduced from the diploid MVR codes of a "victim" and admixed "assail- 
ant", both selected at random from a database of 334 unrelated individuals (see Rg. 5 legend for ethnic composi- 
tion). The ambiguous code was then compared with the MVR codes of each of the 332 other individuals ("false 

30 suspects") in the database, and the number of exclusions over the first 50 repeat units which eliminated each "false 
suspect" as assailant were determined. The exclusion frequency distribution is given for a total of 6000 "victim" plus 
"assailant" pairs (1 .992.000 "false suspects" in total). There were on average 14.2 exclusions per "false suspect", 
and only 1 4 cases in total where a "false suspect" failed to be excluded, giving a mean non-exclusion rate of 7.0x1 0" 
6 per "false suspect". The Y axis represents the number of false suspects and the X axis represents the number of 

35 exclusions in the first 50 repeats. Methods : The ambiguous MVR code of the "assailant" which could be deduced 
from a mixed DNA sample was determined by comparing the MVR codes of the "victim" (V) and "assailant" (S) (see 
Figure 1 1 ). For example, if both V and S are code 1 (aa). then neither V nor a V+S mixture will show a band in the 
T-tracK whence the ambiguous code deductole for S from the mixture, given that the code of V is known is code 
1(aa). 4(aO) or 6(00). Similarly, if V is code 1(aa) and S is code 3(at), then the mixture, but not V. will show a band 

40 in the T-track. whence the ambiguous code deductole for S is code 2(tt). 3(at) or 5(tO). In contrast if the victim is 
code 3(at), then in the V+S mixture both the A- and T-tracks will contain a band, inespective of the genotype of S, 
and thus the deduced code of S at that repeat position will be totally ambiguous. The incomplete code of S deduc- 
tole from each V+S combination was then compared with each non-S database entry ("false suspect", FS) to deter- 
mine whether definitive exlusions exist. For example, if the ambiguous code of S is code 1 (aa), 4(aO) or 6(00) and 

45 if FS is code 4(a0). then no exclusion exists; in contrast if FS is code 3(at). then an exclusion is scored. This anal- 
ysis only uses informtion on the presence or absence of bands from the A- and T- tracks, and does not include addi- 
tional information on relative band intensities in V and the V+S mixture. 

Figure 13 shows the efficiency of MS32 diploid codes in paternity testing. Diploid codes extending for at least 50 
repeat units were obtained from 1 1 5 Caucasian mother-father-child trios. For each trio, the father was removed and 

so replaced sequentially by each of 249 different Caucasian individuals ("non-fathers"). The MVR codes of each 
mother-child plus non-father trio were analysed over the first 50 repeat units to determine the total number of repeat 
unit positions which gave an exclusion, plus the number of paternal-specific exclusions and the number of exclu- 
sions which were directionally ambiguous (e.g. moether and non-father both code 1(aa). child code 3(at)). A, fre- 
quency distribution of paternal-specific exclusions (fBled bars), ambiguous exclusions (shaded bars), and total 

55 exclusions (open bars) for each of the 28,635 combinations of mother-child and non-father. The mean number of 
exclusions was 4.67, 5.19 and 9.86 per child, respectively. The overall proportion of non-fathers showing no exclu- 
sions, or no paternal-specific exclusions, was 0.00229 and 0.01 13 respectively. The Y axis represents the number 
of cases and the X axis represents the number of exclusions in the first 50 repeats. B, frequency distributions as in 
(A) determined after eliminating all child code positions containing an O-type repeat (code 4, aO; code 5, tO; code 
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6, OO). On average, 3.93 paternal -specific, 4.30 ambiguous and 8.23 total exclusions were obtained per trio. 
0.00534 of non-fathers showed no exclusions, and 0.0414 showed no paternal-specific exclusions. The Y axis rep- 
resents the number of cases and the X axis represents the number of exclusions in the first 50 repeats. C. variation 
in the number of non-fathers excluded for each of the 1 15 mother-child combinations. Riled bar. non-fathers eJimi- 

5 nated by paternal-specific exclusions. Shaded bar, men showing patemaJ -specific exclusions after elimination of all 
O-type repeat positions in the child's code. Hatched bar. non-fathers showing any exclusion (paternal-specific plus 
ambiguous). Open bar, non-fathers with any exclusion after elimination of O-type repeats from the child. The Y axis 
represents the number of cases and the X axis represents the proportion of non-fathers excluded. 

Figure 14 shows similarities between different MS32 alleles. A. identification of related alleles. Every pairwise com- 
io parison of the MVR hapkrtypes of 326 different Caucasian alleles was analysed for the proportion of matching 
repeat positions and the proportion of matches which were a-type repeats. For each pair of alleles, comparisons 
were repeated for afleJes misaligned up to ± 10 repeat units out of register (total 2.2x1 0 6 comparisons). The plot 
shows data on 10,000 such comparisons and the presence of a separate grouping of allele pairs to the right of the 
diagonal tine which show significant pairwise similarity. The Y axis represents the proportion of a-type repeats in 
75 matches and the X axis represents the proportion of matches. B, examples of groups of related alleles so identified, 
with gaps (-) introduced to improve alignment. M=Mormon. F=French and B=British individuals- Haptotypic MVR 
map segments shared by related alleles are shown in uppercase and divergences by lowercase. Additional haplo- 
types shared by some grouped alleles are underlined. Some alleles have been mapped using single offspring and 
therefore show uncertain positions. 
20 Figure 1 5 shows the use of tailed primers and TAG sequences in the simultaneous detection of cystic fibrosis (CF) 
mutations- In (a) the mutations are indicated as G551D and R553X in exon 11 of the CFTR gene. In (b) the 
extended R553X primer (indicated as R553X) is bound and masks G551D target so that the shorter G551 D primer 
(indicated as G551D) is blocked. In (c) both the G551D and R553X primers are bound and the use of a tail 
sequence can be used to increase the length of the R553X primer. 

6 Figure 16 illustrates convenient arrangements for diagnostic primers (DP 1-3) and corresponding amplification 
primers (AP1 -3) used in over ARMS. In (i), (iii) and (iv) the primers (DP1-DP3) are provided on the same target base 
sequence. In (iii) and (iv) the primers comprise a cornplementary overlap. In (ii) the primers comprise a comple- 
mentary overlap but are on different target base sequences. 

Figure 17 shows minisatellrte allele repeat coding and detection of N-type repeats by MVR-PCR on total genomic 
30 DIMA. Each inrfvidual DNA sample (1-5) was amplified using 32-TAG-A (A), 32-TAG-T (T) or 32-TAG-N (N) - 
sequences given in Table 3 hereinafter, together with driver primers 32 -O and TAG. PCR products were resolved 
by agarose gel electrophoresis and detected by Southern blot hybridisation with 32 P-labelied MS32 repeat probe. 
The vertical scale 1 -1 0-20-30-40-50-60- indicates the code positions for the individual repeat units. Arrows indicate 
positions heterozygous for U-type null repeats not amplified by 32-TAG-N. Individual 4 has two short alleles, on ter- 
35 rrrinating at position 27 and the other at position 86 (marked with circles). 

Figure 18 shows distribution of null repeats in MS32 alleles. A: the incidence of N-type and U-type null repeat units 
at each position over the f irst 50 repeat units of 391 different MS32 alleles (331 Caucasian and 60 Japanese). The 
Y axis indicates the number of null repeat units and the X axis indicates the repeat unit positions. The black portion 
of the irxfividual bars represent N-type units and the white portions represent U-type units. B: variation in the 
40 number of null repeat units within the first 50 repeats of 391 different MS32 alleles. Distributions are shown for N- 
type. U type and total O-type repeats. The Y axis represents the number of alleles and the X axis represents the 
number of null repeat units per allele. The black bars represent N-type units, white bars represent U-type units and 
hatched bars represent O-type units. C: examples of English alleles (E) containing unusual arrangements of null 
repeat units. 

45 Figure 19 shows examples of group of aligned alleles containing null repeats. Groups of alignabfe alleles were 
identified as previously described, allowing for misalignments between the beginnings of different alleles. Common 
haptotypic segments shared by different alleles are shown in uppercase, and the positons of null repeat units are 
indicated by V for N-type repeats and for U-type repeats. High-order repetitive structures within alleles are 
arrowed. Some alleles maps were derived from the diploid codes of single child-mother-father trios and thus con- 

50 tain ambiguous positions (marked "?"). E=English, F= French, J=Japanese, M=Mormon, B= Bangladeshi. 

Figure 20 shows the efficiency of MS32 diploid codes and the effects of null repeats in paternity testing. Dptoid 
codes extending for at least 50 repeat units were obtained from 141 Caucasian mother-father-child trios. For each 
trio, the father was removed and replaced sequentially by each of 302 different Caucasian individuals ("non- 
father si . The MVR codes of each mother-child plus non-father trio were analysed over the first 50 repeat units to 

55 determine the total number of repeat unit postions which gave an exclusion, plus the number of paternal-specific 
exclusions and the number of exclusions which were directionally ambiguous (eg. mother and non-father both a/a, 
child a/t). A: frequency distribution of paternal-specific exJusions (filled bars), ambiguous exclusions (shaded bars), 
and total exclusions (open bars) for each of the 42,582 combinations of mother-child and non-father. The mean 
number of exclusions was 4.67,5. 1 9 and 9.86 per child, respectively. The overall proportion of non-fathers showing 
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no exclusions, or no paternal-specific exclusions, was 0.14% and 2.5% respectively. The Y axis represents the 
number of cases and the X axis indicates the number of exclusions in the first 50 repeat units. B: frequency distri- 
butions as in (A) determined after eliminating all chad code positions heterozygous for an O-type repeat (a/O, t/O) 
On average. 3 93 paternal -specific, 4.30 ambiguous and 8.23 total exclusions were obtained per trio. 0.50% of non- 

5 fathers shewed no exclusions, and 4.1% showed no paternal-specific exclusions. The axes are identical. C: varia- 
tion in the number of non-fathers exdused for each of the 141 mother-child cxxnbi nations. Filled bar. non-fathers 
eliminated by paternal-specific exclusions. Shaded bar. men shewing patemal-specif ic exclusions after elimination 
of all heterzygous O-type repeat posbons in the child's code. Hatched bar, non-fathers shewing any exclusion 
(paternal-specific plus ambiguous). Open bar, non-fathers with any exclusion after elimination of O-type repeats 

w from the child. The Y axis represents the number of cases and the X axis represents the proportion of non-fathers 
excluded. 

H fljge 21 shows a diagrammatic representation of the MS32 S-f tanking region, shewing polymorphic sites and 
PCR primers. Riled circles represent polymorphic base substituions, open squares rxxi-pdyrrxxpbic restriction 
sites and filed squares polymorphic restiction sites. Arrows indicates PCR primers. PCR primer sequences are: 

15 

32-OR 5'- tcaccggtgaattcACCACCCTTCCCACCAAACTACTC -3\ 
32-H2AR 5 - GTGCAGTCCCAACCCTAGCCA -3\ 
32-H2C 5'- TGATGCGTCGTTCCCGTATC - 3\ 
32-D2 5'- CGACTCGCAGATGGAGCAATG -3\ 
20 32-D 5'- CGACTCGCAGATGGAGCAATGGCC -3', 

32-H1 C 5 - TGGTGCTGCAAAAGAAATAC -3\ 
32-H1 B 5 - TTTGGTGCTGAAAAGAAAG -3', 
32-NR 5'- AGTAGCCAATCGGAATTAGC -3' and 
32-B 5'- TAAGCTCTCCATTTCCAGTTTCTGG-3'. 

25 

32-OR carries a 14bp 5* extension (lower case, incorporating cloning sites) which is nonessential for the work 
described here. M represents the MS32 minisateflite. 

Ftoure 22 shews PCR assays for the three poryrrxxphic sites identified in the flanking region of MS32. A: Segre- 
gation analysis of the Humpl polymorphism for CEPH family 1416. SSCP analysis top and PCR assay bottom 

30 (genotypes for each individual are shown, GGA3C/CC). PCR-SSCP analysis of the flanking DNA amplified with 32- 
OR and 32-B was performed using the method of Orrta et al., 1989, incorporating ct-^P-dCTP during PCR, fol- 
lowed by digestion with Hinf I and electrophoresis through a 5% pdyacrylamide, 1 0% glycerol gel in 1 x TBE at 4°C. 
For the direct Humpl PCR assay 0. 1 yd of 32-OR - 32-B PCR product was reamplified using the nested primers 32- 
H1 B and 32-NR for 28 cycles with an annealing temperature of 55°C and an extension time of 2 minutes. 5 \l\ of 

35 this amplification was digested with Bsp 12861 and resolved by gel electrophoresis through a 3% NuSieve GTG, 1% 
Sigma Typel agarose gel in 1x TBE and the products visualised by ethidium bromide staining. Zero DNA controls 
(O) and X174 Haelll size markers (4>) are also shown. B: Segregation analysis of the Hf polymorphism for CEPH 
family 1331. 348bp of immediate flanking DNA was amplified using primer pair 32-OR plus 32-B for 30 cycles with 
an annealing temperature (A) of 69°C and a 2 minute extension time (E). 2\i\ of PCR products were digested with 

40 Hinf I. resolved by electrophoresis as above. All individuate produce a constant 163bp product Individuals 
homozygous for the Hf allele ( --) produce a produce of 1 99bp. In individuals homozygous for the Hf + allele (++) the 
199bp band is further digested to give bands of 141 and 58bp. Heterozygous individuate (HfVHf) produce all four 
bands (+-). C: Segregation analysis of the Hump2 pc>lyrnorphism for CEPH family 1421 and four unrelated individ- 
uals (1-4C; genotypes for each individual are shown, CC/CT/TT). Hump2 analysis was achieved using primers 32- 

45 OR, 32H2C, 32-H2AR and 32-B at final cc>ncentrations of 0.5. 0.5. 2 and 1jiM respectively in a single tube assay. 
PCR was performed with an annealing temperature (A) of 67°C. an extension time (E) of 2 minutes for 30 cycles 
and the products resolved by agarose gel electrophoresis as above. 

Figure 23 shows primate sequence comparisons for the MS32 flanking region. The sequence of the human clone 
between primer pair 32-OR and 32-B is given in full (Wong et al., 1987). The human-African Ape ancestral 

50 sequence was derived from the human, Chimp, Gorilla and Orang-utan sequences (Gray and Jeffreys, 1991), 
using Organg-utans as the outgroup. HC= Human Clone, AS=Ancestral Sequence and HV= Human Variant. Posi- 
tions of variation only are indicated in bold. N's in the ancestral sequence represents sequence not known. 
Figure 24 shows 'Knockout' MVR-PCR. For each of flanking polvnrorphisms (Humpl, Hf and Hump2) three unre- 
lated inrividuals (1-3) were chosen who were heterozygous for the polymorphism. Each individual was analysed 

55 by MVR-PCR using either the universal flanking primer 32-0 (O) to generate the diploid code from both alleles or 
the allele specific flanking primer (32-H1C, 32-D2 or 32-H2C) to generate coding from a single allele. MVR-PCR 
products extending to a-type repeats (T) were resolved by agarose gel electrophoresis and Southern Wot hybridi- 
sation using 32 P- labelled MS32 as probe. The 10th repeat unit on the MVR-PCR ladder is arrowed to show regis- 
tration of single allele and diploid codes. 
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Figure 25 shows the application of knockout MVR-PCR to mixed DNA samples. MVR-PCR of mixtures of two 
DNAs using aJlele-specific flanking primers 32-D2 or 32-H2C in an otherwise standard MVR-PCR reactions. Indi- 
vidual X f victim') was a Hf homozygote (Hf/Hf) and a H2 T homozygote (H2 T /H2 T ) and individual Y f assailant') 
was a Hf heterozygote (Hf+/Hf-) and a H2 C homozygote (H2°/H2 C ). Mixtures of DNA from X and Y were prepared 

5 using a fixed amout of X (150ng) and decreasing amounts of Y (150ng down to 0.75ng). The most dilute samples 

of Y (1/1 00 + 1/200) were gjven a further 2 cycles of PCR to increase the yield of product to detectable levels. The 
figures 0.5, 0.25. 0.01. 0.005, 0.001, 0.0005 represent the proportion of Y in the X+Y mixture. 
Boure26 shows the efficiency of single allele codes in excluding incf viduals based on corrparison with their diploid 
codes. Single allele codes extending over at least 50 repeat units were established for 41 1 different MS32 alleles 

w (349 Caucasian and 62 Japanese). Each allele was then compared with the diploid code of each of 408 unrelated 
individuals, giving 167.688 allete/iixfividua! comparisons in total. For each comparison, repeat unit positions which 
excluded the allele as having come from the individual were identified; for example, an allele with a t-type repeat 
unit at a given position could not have come from an individual homozygous for a-type repeats at that position. The 
frequency distribution of the total number of exclusions over the first 50 repeat units is given for all allele/individual 

is compartsorts The Y axis represents the number of cases and the X axis represents the number of exclusions. 

Figure 27 shows the organization of the MS31 focus and the localisation of PCR primer sites and the flanking Alul 
+/- site polymorphism. Primer sequences (5'-3 r ) are as follows: 





A=31A 


= CCCTTTGCACGCTGGACGGTGGCG 


20 


B=31B 


= CCCACACGCCCATCCGGCCGGCAG 




C=31C 


= GGCACAACCTAGGCAGGGGAAGCC 




D=31D 


= CCCCACACCGGCACACCGTC 




E=31E 


= GGACAGCCAAGGCCAGGTCC 




F=31F 


= CCACTCGGAACCACCTGCAG 


25 


310R 


= GGAGGGGCCATGAAGGGGAC 




31Aluk 


= CATGAAGGGGACTGGCCTTA 




3 1 Alii- 


= CATGAAGGGGACTGGCCTTG 




31-Tag-A 


= A 

GGTGGAGGGTGTCTGTGAggcctgg^acctgcgtact 


30 


31 -Tag-G 


= G 




Tag 


= aggcctyytacctgcgtact 



The relevant 20bp MS31 repeat units are ACCCACCTCCCACAGACACT and GTCCACCTCCCACAGACACT 
respectively. 

35 Fi gure 28 shows MVR-PCR analysis of single MS31 A alleles. For each allele, amplification was performed with 31- 
Tag-A to reveal the position of a-type repeats (a track) and 31 -Tag-G to map t-type repeats (t track). The bracketed 
region in allele 6 shows examples of band intensity fluctuation. 

Figure 29 shows the identification of related MS31 A alleles by dot matrix analysis. The MVR codes of 34 different 
Caucasian alleles were assembled into a continuous "sequence" with each allele followed by padding to increase 
40 its length to 1 00 repeats (3400 "repeats" in total "sequence"). The dot matrix shows this complete sequence com- 
pared with itself to search for 8-repeat perfect matches. Related alleles generate short diagonals off the main diag- 
onal. 

Figure 30 shows three MS31A alleles showing related segments (uppercase) in their MVR codes, a = a-type 
repeat; t = t-type; O = null or O-type repeat. Indicated are MVR code for (i) lower allele of CEPH individual 133413, 

45 (ii) lower allele of CEPH individual 1329912 and (iii) lower allele of CEPH individual 6602. 

Figure 31 shows digital coding of genomic DNA by duplex MVR-PCR. Each PCR reaction contained 100ng 
genomic DNA from individuals ( 1 -9) and 1 fiM Tag, 1 jiM flanking primers 3 1 A and 320R, plus 40nM 31 -Tag-A, 1 0nM 
32-Tag-C (a-track) or 20nM 31 -Tag-G, 20nM 32-Tag-T (t-track). After 21 cycles of MVR-PCR, PCR products were 
resolved by agarose gel electrophoresis and detected by Southern bfot hybridisation with 32 P labelled MS31 . 

50 Figure 32 shows the results of probe stripping the gel shown in Figure 31 and re-probing with 32P labelled MS32. 
Figure 33 shows a PCR assay of the Alul site polymorphism flanking MS31 A. 100ng samples of genomic DNA 
from 1 7 dfferent individuals were amplified in 7\i\ PCR reactions with 1jiM 31 -Tag-A. 1jiM flanking primer 31 A and 
0.25 units of Taq polymerase (Amersham) for 35 cycles of 96° C for 1 .3 min., 70° for 1 min. per cycle, followed by a 
chase of 67°C for 1 min., 70°C for 10 min. The unpurified PCR products were then digested with 5 units Alul in the 

55 presence of 1mM spermidine trichloride and the approprite buffer and electrophoresed through a 5% NuSieve 
(FMC) agarose gel in 40mM Tris-acetate (pH 8.3), 0.2mM EDTA. DNA was visualised by staining with ethidium bro- 
mide. 
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EXAMPLE 1 



MVR-PCB on separated MS32 alleles 

5 To determine the feasibility of MVR-PCR, genomic DNA from individuals known to contain large MS32 alleles was 

cleaved with Sau3AI and alleles separated by preparative gel eletrophoresis. Each separated allele was amplified with 
1jiM 32D plus 1fiM TAG primers in the presence of increasing concentrations of 32-TAG-A or 32-TAG-T. PCR products 
were resolved by agarose gel electrophoresis and detected by Southern blot hybridization with ^p-labelled MS32 rmn- 
isatelirte probe. The yield of PCR products increased with increasing concentration of MVR-specrf ic primers, but at high 

w concentrations, the products progressively shortened due to internal priming (data not shown). At optimal primer con- 
centrations (10nM 32-TAG-A, 20nM 32-TAG-T), complementary ladders of PCR products extending >3kb into each 
allele were generated, from which allele binary codes could be readily deduced (Fig. 2). Minimal mispriming of MVR- 
specrf ic primers occurred off the wrong repeat units at annealing temperatures above 64° (data not shown). 

In most cases, the two MVR-specrf ic primers generated a continuous complementary series of products. Occasion- 

75 ally, however, a "rung" on the MVR coding ladder failed to be amplified by either MVR-specrf ic primer (Fig. 2). indicating 
the presence of "null" repeats containing an additional sequence variant 3' to the Hae lll site which blocks priming by 
either primer. 1 .6% of repeats units scored from 32 separated Caucasian alleles were null or O-type repeats. 

EXAMPLE 2 

20 

Authenticity off allele MVR codes generated bv MVR-PCR 

To determine whether these allele codes agreed with codes established by partial digestion with Hae lll. PCR prod- 
uct 2.5-3.0 kb long from MVR-PCR were size-selected by agarose gel electrophoresis, re-amplified using primer 32D 

25 plus TAG until >1 Ong product was generated, and mapped as described previously (Jeffreys et al. . 1 990), by end-label- 
ling at the 32D primer site and partial digestion with Hae lll. 32 (Afferent MS32 alleles so mapped gave fully concordant 
results using both approaches. Representative examples of MVR codes are given in Fig. 3 A, and a summary of MVR 
composition of alleles in Fig. 3B. The rare null or O-type repeats correspond to both Hae lll resistant repeats. Other than 
O-type repeats, all a- and t-type repeats were fully concordant, with only one exception, namely one allele containing a 

30 single repeat amplified by 32-TAG-A but not cleaved by Haelll (Fig. 3 A) This repeat presumably contains a rare variant 
at the first base of the Haelll site which destroys the restriction site but does not affect priming by 32-TAG-A (Fig. 3C). 

EXAMPLE 3 

35 MVR-PCR on tPfrl gen omic DNA 

MVR-PCR on genomic DNA should produce a profile of both alleles superimposed, to generate for two-variant alle- 
les a ternary code (Fig. 1 A), where each rung in the ladder can be coded as 1 (both alleles a-type at that position, aa). 
2 (both t-type, tt) or 3 (heterozygous, at). The presence of O-type repeats creates three additional coding states. 

40 namely 4(aO), 5(tO) and 6(00) The last will appear as a gap on the ladder and the first two as relatively faint bands 
specifically in the A- or T-track- Coding states 4-6 will also be generated if one allele is short; beyond the end of the short 
allele, the code will be derived from only one allele, and if both alleles are short, then no PCR products will appear 
beyond the longer allele, generating a 66666.... code. 

To investigate the feasfotlity of MVR-PCR on total genomic DNA, 0.1 ug samples of human DNA were amplified and 

45 products detected by Southern blot hybridization with MS32 minisatellite probe (Fig. 4). In each case, clear and unam- 
biguous diploid codes could be read at least 50 repeat units into the minisatellite. The two tracks generating the code 
contain considerable informational redundancy; thus in almost all cases, an intense band in the A-track was matched 
by no band in the T-track (code 1 , aa), a faint A band by a faint T band (code 3, at) and no A band by an intense T band 
(code 2. tt). This dosage phenomenon not only provides a detailed check on the authenticity of the code generated, but 

so also makes it possible to identify with good reliability any rung positions which are heterozygous for a null or O-type 
repeat (code 4. aO; code 5. tO); examples of such positions are shown in Fig. 4. 

EXAMPLE 4 

55 Individual variation in diploid codes 

The MVR-PCR profiles shown in Fig. 4 clearly shown extreme variation between individuals. To investigate further 
the degree of variability of these codes, a panel of 334 unrelated individuals was typed by MVR-PCR. All diploid codes 
were read from the second repeat unit into the minisatellite. since the first repeat is faint and cannot be reliably scored. 
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The start position for reading the code was confirmed by running a standard individual of known code on all gels; the 
standard incfividual also provided a check that the correct codes were being generated on a given gel. If there was any 
doubt whatsoever about the coding state at any given repeat in an individual, the rung was coded as Only 0.3% of 
code positions (59 rungs in 20.702 scored) were entered as T and were ignored in subsequent database searches. 

5 Every pairwise comparison of the 334 dploid MVR codes over the first 50 repeat units scored revealed that no two 

individuals typed shared the same code (Fig. 5). The individual specificity remained when band intensity information 
was removed by converting all code 4<aO) and code 5(tO) positions to codes 1(aa) and 2(tt) respectively, to generate 
quaternary codes (1 ,2,3,6) corresponding to a band present only in the A-track, only in the T- track, in both tracks and 
in neither track, respectively. There were on average 30 mismatches per pair of individuals over the first 50 repeat units. 

io with a smooth distribution of mismatch frequencies over all 55,61 1 pairwise comparisons in the population database 
(Rg. 5A,B). AH individuals could in tact be distinguished using information from just the first 1 7 repeat unit positions. The 
two most similar individuals had MVR codes dominated by code 1(aa). indicating that all four alleles in these individuals 
were composed largely of a-type repeats; such homogeneous alleles have been noted previously (Jeffreys et al., 1990). 
The most dissimilar pairs of individuals arose where one individual contained a short allele, creating a diploid code 

15 dominated by the rare codes 4,5 and 6 (Rg. 5C). Individuals with short alleles create the shoulder of high numbers of 
repeat unit differences on the frequency distribution shown in Fig. 5A, and as predicted, this shoulder is eliminated on 
removal of band intensity assumptions. 7.8% of individuals contained short (<50 repeats) alleles with allele lengths 
ranging from 19 to 44 repeat units. Short alleles do not occur with equal frequency in all populations; thus 5.6% of Cau- 
casian individuate contain short alleles, compared with 23% of Japanese (p<0.001). 

20 

EXAMPLE 5 

Heterozygosity levels at MS32 determined from MVR codes 

25 Diploid codes provide a more objective method for identifying homozygotes than allele length measurements by 
conventional Southern blot hybridization analysis of genomic DNA. Presumptive homozygotes will show diploid MVR 
codes restricted to code 1(aa), 2(tt) and 6(00). with no heterozygous repeat positions. Three individuals (one French, 
two Japanese) one of 334 surveyed showed homozygosity by this criterion, suggesting a mean heterozygosity level of 
99. 1%. It is possible that such apparent homozygotes are in fact heterozygous for a second allele which contains a 32D 

30 primer mismatch in the flanking DNA (Fig. 1C), preventing PCR amplification. However, all individuals scored as 
homozygous by MVR-PCR showed as predicted a single band on Southern blot hybidization of genomic DNA (data not 
shown). Conversely, the majority (8/10) of apparently single band individuals detected by hybridization with MS32 were 
in fact heterozygous for similar or identical length alleles as shown by diploid coding, again establishing that the level of 
variability at MS32 is substantially greater than can be resolved by conventional allele length analysis. 

35 

EXAMPLE 6 

Variation in repeat unit composition along diploid MVR codes 

40 The variation in allele MVR composition along the MVR code was extracted from the database of 334 individuals 
(668 alleles) (Fig. 6A). The relative frequency of a- and t-type repeat units is fairly uniform at all positions along the 
code, whereas the frequency of O-type repeats within alleles tend to increase with distance into the alleles; this may 
reflect reduced levels of homogenization at the relatively invariant distal ends of MS32 alleles allowing additional repeat 
variants to arise by mutation within repeat units and to survive elimination by processes such as crossover fixation. The 

45 probability that two individuals would match at a given MVR code position is also fairly constant along the code (Rg. 
6B) and is correctly predicted from the relative frequencies of a-, t- and O-type repeats at each position. This in turn 
implies that each repeat position, treated as a trialleJic focus (a. t. O), is a Hardy-Weinberg equilibrium in the population. 
X 2 tests show this to be the case (Fig. 6C). However, Hardy-Weinberg equilibrium at each position does not imply link- 
age equilferium between different positions; indeed, there is clear evidence for major disequilibrium between these 

so essentially completely linked repeat unit positions, as shown for example by the existance of alleles largely homoge- 
nized for a-type repeats (Rg. 5C). see Jeffreys et al., 1990), non-random dispersal of O-type repeats over alleles and 
individuals (Fig. 3A), and the existence of distinct but very closely related alleles (see below). 

EXAMPLE 7 

55 

Extraction of allele MVR maps from pediorees 

The variability of diploid MVR codes is governed by the number and frequencies of different MS32 alleles in human 
populations. MS32 alleles can be mapped using electrophoretically-separated alleles (Fig. 2). However, this approach 
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is cumbersome, and a tar simpler approach Is to deduce allele haplotypes from perfgree data, as shewn in Fig. 7. Using 
the diploid MVR codes of a mother, father and a single child, it is possible to deduce the MVR haplotypes of all four 
parental alleles, except at repeat positions where all individuals are heterozygous for the same variant repeats (e.g. ail 
inrJviduals code 3, at). The minimum data required to determine completely all four parental haplotypes are the diploid 

5 codes of the mother, father and two children who share only one parental allele in common. In more extensive sibships, 
for example those of the CEPH fam&es, up to four classes of children differing in their diploid codes can be identified 
corresponding to the four possfole combinations of parental alleles. Computer programmes have been developed which 
can extract the unambiguous parental hapfotypic maps from such sibships, as well as identifying the parental alleles 
transmitted to each child (Fig. 7 legend). Such large sibships also contain considerable informational redundancy useful 

10 for checking the authenticity of the deduced haplotypes. In adoStion. 1 1 of the haplotypes of relatively short alleles 
deduced from large pedigrees have also been verified by Ha e l 1 1 cleavage of PCR amplified alleles (Fig. 7). 

EXAMPLE 8 

is Allelic variability at minisatellrte MS32 

Family analysts, together with analysis of eletrophoretically-separated alleles, has allowed us to generate mapping 
data on 254 Caucasian alleles (Fig. 8). Haplotype comparisons show that this collection of alleles contains 248 different 
alleles, 243 of which have been detected only once in the alleles surveyed. Under a simple model in which all alleles 

20 are equally rare, Poisson distribution analysis indicates that approximately 6300 different MS32 alleles must exist in 
Caucasians to give the sampling frequency distribution shows in Rg. 8A. The only allele with a possbly significant fre- 
quency (Fig. 8A.B) still has a very low frequency (3/254 = 0.012). H this allele is removed, the Poisson estimate for the 
total number of alleles increases. The allele number estimate is also likely to be conservative since one of the four pairs 
of alleles sampled twice is present in the only Caucasian homozygote so far detected, and another is shared by the par- 

25 ents of an Amish family; such repeat isolates of alleles may therefore reflect corrsanguinity/inbreeding rather than alle- 
les with a significant population frequency shared by unrelated individuals. 

The allele database also contains several examples of pairs of alleles which show relatively few differences and 
where the MVR codes are clearly related (Fig. 8C). Interestingly, these pairs of alleles shorn differences preferentially 
clustered over the beginning of the alleles; this reflects the gradient of variability previously detected along MS32 alleles 

30 mapped in their entirety by Haelll cleavage (Jeffreys et al., 1990). Note that the mutational changes(s) which have 
altered the map of these related alleles can not have resulted in a net change in repeat copy number, which would throw 
the allelic MVR codes out of register beyond the point of repeat copy number change, creating multiple differences. 

EXAMPLE 9 

35 

Mutation rates and processes at MS32 

The levels of allelic variability at MS32 so far estimated are extraordinary, and go far beyond the number of alleles 
which can be discriminated by length (repeat unit copy number). Such ultravariability must be maintained by a high de 

40 novo mutation rate altering the MVR map of MS32 alleles. Allele length changes at MS32 have already been detected 
both by pedigree analyis (Armour et al.. 1989) and by single molecule PCR analysis of mutant alleles arising by large 
deletions (Jeffreys et al., 1990). 

To quantify MS32 haplotype mutation rates, diploid MVR codes were analysed in 286 offspring from trie CEPH col- 
lection of large families (Fig. 9). 7 offspring were found with MVR codes showing multiple parental exclusions, indicating 

45 the presence of a mutant allele. In each case, code positions specifically excluding only one parent were detected (Fig. 
9.1), defining the parental origin of the mutant allele. Non-mutant children in the same family were used to deduce the 
haplotypes of the non-mutant parental alleles, whence the non-mutant allele inherited by the mutant child could be iden- 
tified. Subtraction of this rKXi-mutant allele from the diploid code of the mutant child yielded the MVR haplotype of the 
mutant allele. Comparison of the mutant map with the maps of the two possible progenitor alleles allowed the nature of 

so the mutation event to be mapped onto the MVR haplotype of the parental progenitor allele(s). 

The overall mutation rate in MS32 MVR maps is approximately 0.012 per gamete, with paternal and maternal muta- 
tions arising with similar frequency. Curiously, all 7 mutations events so far detected were associated with a gain in 
repeat copy number, in most cases of a very small number of repeat units (1-3 repeats). All germline length change 
events previously found in MS32 alleles (Armour et al., 1989) were also detected in this survey, together with three addi- 

55 tional events each resulting in the gain of a single repeat unit (29 bp DNA); not surprisingly these events were not 
detectable by Southern Wot analysis of genomic DNA but nevertheless have a profound effect on the diploid MVR code. 

Despite the fact that MS32 alleles are on average 200 repeat units long, the locations of the mutation events are 
extremely clustered, in most cases within the first 10 repeat units of the MS32 alleles over the region known to show 
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maximum allelic variabi&ty. This provides further evidence for the presence of a mutational hotspot at the extreme end 
of MS32 alleles, responsible for generating the gracfient of variability seen alone this locus (Jeffreys et al., 1990). 

Some information about mutation mechanisms could be deduced from comparison of mutant and progenitor alle- 
les. For mutant d (Rg. 9.2,3), the site of the single repeat unit addition in the mutant allele is preceded and followed by 

5 MVR code derived from the same maternal allele. Such an event is probably intra -allelic, and could have arisen for 
example by unequal sister chromatid exchange or by replication slippage. In contrast, mutant e (Rg. 9.1-3) provides 
clear evidence for inter-aJlelic unequal exchange between the two paternal alleles, the mutant allele commencing with 
one paternal haplotype, then switching after two a -type repeats of unknown origin to the beginning of the other paternal 
allele- The origin of the two a -type repeats is unclear, but may represent some form of slippage event at a recornbination 

w junction. One other mutant (f) also appeared to have arisen by inter-anelic unequal exchange, but the presumptive 
exchange point Bes too dose to the beginning of the allele to be certain that the mutant allele does contain a recom- 
binant haplotype. Similarly mutants a, b and g appear to arisen by an intra-allefic event but the existence of a recom- 
binant mutant allele cannot be ruled out 

A recombination hot-spot near the end of MS32 alleles ?:- Previous studies of DNA markers flanking minisatellite loci 
15 have shewn that minisatellite allele length change is not exclusively driven by inter-alleJic recombination, though the 
possibility of some inter-allefic unequal exchange could not be excluded (Wolff et al., 1 988, 1 989). Similarly, single mol- 
ecule PCR analysis showed that large (39-136 repeat unit) and rare deletions in MS32 alleles were an exclusively intra- 
allelic phenomenon, and that inter-anelic unequal exchange could be responsible for, at most only 6% of these large 
deletion events (Jeffreys et al., 1990). Similarly, the existence of MS32 alleles largely homogenised for a-type repeats 
20 suggests that inter-altelic exchange, which would distrupt homogeneous arrays by introducing t-type repeats, must be 
relatively scarce. 

The present data provide the first evidence that inter-allelic recombination plays a significant role in minisatellrte 
instability, at least at MS32. One, and probably two, of the mutant alleles bear the hallmarks of unequal crossing over, 
although it is as yet unknown whether these represent authentic recornbination events since currently available DNA 

25 markers flanking MS32 are too remote to test whether these mutations have been accompanied by exchange of cfistal 
flanking markers. If these two mutation events have arisen by a conventional (if unequal) inter-allelic recornbiation proc- 
ess, this implies a recombination frequency of 2/572=0. 3cM over the first 400bp of the minisatellite, compared with a 
mean frequency of 1 cM per 1 0 6 ^ in the human genome and therefore representing a 700-fold enhancement of recom- 
bination. If correct this would represent a dramatic example of a human recombination hotspot, and revitalizes earlier 

30 speculation that minisatellites may be actively involved in chrornosomal processes such as homotogue recognition, syn- 
apsis and meiotic recombination (Jeffreys et al., 1985a; Chandley and Mitchell, 1988; Royle et al.. 1988). However, the 
simple recombination hotspot model would predict that additional "mutant" MS32 alleles should arise by equal recom- 
bination to produce recombinant haplotypes in which the repeat copy number is identical to one of the parental alleles. 
Screening of the CEPH families with multiple offspring should identify such recombinant children as offspring showing 

35 no parental exclusions but showing one (recombinant) haplotype incompatible with haplotypes derived from other chil- 
dren. Such recombinant alleles have not yet been detected. However, the clustering of exchange events at the extreme 
beginning of MS32 alleles may well make many such events undetectable. 

We have previously suggested that MS32 alleles do not engage in inter-allelic exchanges and therefore evolve 
largely, rf not exclusively, along haploid chromosomal lineages (Jeffreys et al.. 1990). The present data suggest a more 

40 complex picture, with most mutational events involving the gain or loss of small numbers of repeat units towards the 
extreme beginning of alleles, and with a significant involvement of inter-allelic exchange in the mutation process. In con- 
trast, regions of the tandem repeat array distal to the recombination hotspot show much lower allelic variability and 
appear to evolve by relatively low frequency intra-afletic processes such as unequal sister chromatid exchange and rep- 
lication slippage. This explains how alleles largely homogenized for a-type repeats can accumulate in the population, 

45 and it is significant that such homogeneous arrays are in fact usually preceded by a normal segment containing inter- 
spersed a- and t-type repeats (Jeffreys et al.. 1990; data not shown), as expected if inter-allelic exchanges are largely 
confined to the beginning of the tandem arrays. 

EXAMPLE 10 

50 

The sensitivity of MVR-PCR 

Diploid MVR codes have a great power of individual disaimiation. To determine whether MVR-PCR can be applied 
to trace levels of DNA. decreasing amounts of human genomic DNA were amplified and typed. Normal profiles were 
55 obtained down to 10ng genomic DNA (not shown). At O.Mng DNA (17-170 diploid genomes), intensity fluctuations 
arose within the MVR profile, presumably due to stochastic loss of PCR products from the small number of input mini- 
satellite molecules (Fig. 10). However, these fluctuations occurred apparently at random, and reliable consensus diploid 
codes could be derived by comparison of three replicate MVR profiles obtained from 0.1 ng samples of genomic DNA. 
Thus MVR-PCR can be extended reliably to sub-nanogram amounts of human DNA. 
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EXAMPLE 11 

MVR-PCR on degraded DNA 

5 MVR-PCR does not require intact minisateflite aOeles but instead recovers information from any DNA fragments 

which are long enough to include the flanking 32D priming site located 192-212bp before the minisateHrte plus at least 
some 29bp repeat units (Fig. 1 C). To determine whether MVR-PCR can therefore be applied to degraded DNA, human 
genomic DNA was progressively sheared by sonication before amplication by MVR-PCR. Even highly sheared DNA 
(mean fragment size 400 bp. >95% of DNA <1 000 bp long) generated authentic diploid MVR code information, although 

w the code faded out after approximately 25 repeat units (data not shown). Nevertheless, the truncated codes obtainable 
from degraded DNA are still compattole with database searches, although with increasing loss of discrimination power 
as the code is progressively shortened. For highly degraded DNA, additional information an be recovered by substitut- 
ing the flanking primer 32D for 320 (sequence GAGTAGTTTGGTGGGAAGGGTGGT), which primes immediately adja- 
cent to the start of the tandem-repeat array (Fig. 1C) (data not shown). 

15 

EXAMPLE 1? 

MVR-PCR on mixed DNA samples 

20 Forensic samples sometimes contain DNA from two or more individuals. In particular, semen-bearing vaginal 
swabs from rape victims can yield DNA both from the known victim and from the assailant. To determine whether MVR- 
PCR can be applied to such mixed samples, genomic DNA from one individual was mixed with decreasing amounts of 
DNA from a second individual and typed (Fig. 1 1). Clear indication of admixture could be obtained down to 10% mix- 
tures of DNA. Comparison of the code from the pure DNA sample ("victim") with the mixed DNA sample enabled an 

25 incomplete and ambiguous diploid code of the "assailant" to be derived (Fig. 11) which nevertheless can still be used 
successfully to interrogate a diploid code database to find a matching individual and to determine the frequency of 
matching individuals in the database. Thus MVR-PCR can be applied to mixtures of DNA from two individuals (eg. vic- 
tim plus rapist DNA recovered from semen-bearing vaginal swabs), particularly rf pure DNA from one of the individuals 
(eg. victim) is available. 

30 The incomplete assailant codes deducfole from mixed DNA samples will have less power of discrimination than 
normal diploid MVR codes. To determine the efficiency of idenfication using information from mixed DNA samples, 
2x1 0 6 combinations of "victim", "rapist" and "false suspect" were created from the population database of MVR codes 
and checked for suspect exclusions (Fig. 12). The mean number of exclusions over the first 50 repeat units fell from 30 
for normal MVR codes (Fig. 3) to 14 for comparisons of the ambiguous code deducible from mixed DNA samples with 

35 the normal MVR code of a suspect The mean power of exclusion nevertheless remains very high (99.9993%). 

EXAMPLE 13 

The efficiency of diploid MVR codes in paternity testing 

40 

MVR-PCR can be used in parentage testing since the diploid codes of non-parents will frequently show exclusion- 
ary mismatches with the child. Such mismatches can either be rjrectional, defining which parent is excluded, or ambig- 
uous, incficating an inconsistency within an alleged mother-father-child trio but not defining which parent is excluded 
(see for example Fig. 9.1). To determine the effectiveness of MVR-PCR in excluding non-fathers, the MVR codes of 1 15 

45 Caucasian mother-child duos were determined, and each duo was then compared with each of 249 unrelated Cauca- 
sian individual ("non-fathers") (Fig. 13A). On average, 9.9 exclusions were obtained per comparison of the first 50 
repeats of the MVR codes, of which 4.7 were paternal-specific exclusions and the remainder directional ly ambiguous. 
98.9% of non-fathers showed at least one paternal-specific exclusion, and 99.8% showed at least one exclusion in total 
(paternal-specific plus ambiguous). 

so Null or O-type repeats are relatively rare and when present in a child but not mother provide relatively powerful 
markers for excluding non-fathers. However, identification of heterozygous null positions (code 4, aO; 5, tO) requires 
correct interpretation of band intensities in the child. To determine the contribution of null repeats to the efficiency of 
paternity testing, the simulated non-paternity cases were re-evaluated after elimination of all code 4and 5 positions in 
each child (Fig. 13B). As expected, the mean numbers of exclusions fell significantly, causing a drop in the proportion 

55 of non-fathers showing exclusions from 99.8% to 99.5% (99.8% to 95.9% for paternal-specrfic exclusions). The above 
estimates for the efficiency of non-paternal exclusion are a mean over all mother-child duos. Variation between duos in 
these levels of exclusion was therefore investigated (Fig. 13C). The proportion of the 249 "non-fathers" excluded, either 
by paternal-specific exclusions or by total exclusions, varied substantially from duo to duo, according to the precise 
nature of the MVR codes of the mother and offspring. In the worst case, only 80% of non-fathers could be excluded. 
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Nevertheless, the power of non-paternal exclusion of a single locus is Impressive, and is significantly greater than can 
be achieved by convention Southern blot analysis of genomic DNA (OdeJberg et aJ . , 1 989) . This power is however offset 
ot some extent by the relatively high de novo mutation rate of t .2% per gamete, resulting in approximately 1 .2% of off- 
spring showing mutation of the paternal allele. The false exclusion rate of fathers is therefore approximately 1 .2%, and 
5 the false inclusion rate of non-fathers is on average approximately 0.2-4. 1%, depending on the nature of the exclusions 
employed in paternity testing. 

EXAMPLE 14 

w m s »- Allel ic v yfettffl y 

339 Caucasian alleles have been mapped by family analysis and from separate alleles. Haplotype comparisons 
revealed 326 different alleles. 316 detected only once in the alleles surveyed, together with 9 alleles sampled twice and 
one allele detected three tones. The maximum frequency of any allele at this locus is therefore very low (3/339=0.009). 

15 Under a simple model in which all alleles are equally rare, Poisson disributkxi analysis indicates that ' 3500 different 
MS32 alleles must exist in Caucasians to give this sampling frequency distribution. Given the high mutation rate of 
MS32 (see below), the true level of allelic diversity in humans is likely to be gigantic, with >10 8 different and distinguish- 
able alleles in contemporary human populations different MS32 alleles can have related MVR haptotypes. All 326 dif- 
ferent alleles were therefore compared to identify groups of alleles which showed significant similarities in repeat maps 

20 and to eliminate groups of alleles dominated by a -type repeats which showed high levels of matching without clear indi- 
cation of significant r elatedness (Fig. 1 4A). This heuristic alignment approach showed that 47% of alleles could be clas- 
sified into 32 different groups each containing 2-22 significantly related alleles; each of the remaining 174 alleles 
showed no detectable matches with any other alleles. Example of groups of related alleles are shown in Fig. 14B. host 
significantly, the majority of irtter-allelic differences in repeat copy number and interspersion pattern of a- and t-type 

25 repeat units are resticted to the extreme beginning of the tandem array, over the region previously identified as shewing 
greatest allelic variability. Less frequent differences further into alleles also occur, resulting mainly from minor changes 
in repeat unit copy number, and apparently from switching of a- and t-type repeats without changes in the number of 
repeat units. 

30 EXAMPLE 15 

Detection of cystic fibrosis (CR mutations using tailed ARMS primers 

In order to develop a useful ARMS test for somatic cancer mutations (e.g. the C>T mutations of codon 1338 of the 
35 APC gene) it is necessary to be able to distinguish between PCR failure and the absence of the mutation. The inclusion 
of additional control PCR reactions for the purpose of demonstrating PCR activity in negative ARMS tests causes a 
reduction in test sensitivity. However, as sensitivity is an absolute requirement of such tests, a method for the inclusion 
of a positive control PCR reaction without comprising sensitivity is required. The following two-step method employs the 
5' TAG sequences technique for this purpose: 

40 

Step 1 - A multiplex PCR reaction containing ARMS primers and control PCR reaction primers. All four primers 
carry the TAG sequence as a 5' tail. The control PCR amplimers are included at a low concentration relative to the 
ARMS primers (e.g. 10nM cf 1fiM). Thus, the ARMS reaction works with much greater efficiency that the control if 
the mutation is present. 

45 Step 2 - The products of the PCR reaction (step 1 above) are used to seed the second PCR reaction. The second 
PCR reaction contains only one primer; the TAG sequence. The TAG primer is rich in G+C residues relative to the 
ex-TAG portion of the control PCR primers. This second PCR reaction is performed at a high annealing temperture 
which prevents the action of any carried -over control PCR primers. The TAG primers, however, are able to effi- 
ciently amplify the control and ARMS PCR products at the elevated annealing temperture. Therefore, the presence 

50 of the somatic mutation will lead to the formation of an ARMS and a control PCR product after step 2 PCR whereas 
in its absence only the control PCR product will be detected. 
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Primer 


Sequence 5*>3' 


%(G+C) 


5 


TAG 


GCGACCGGTCGCCGGACGCC 


85.0 




CFTR Exon 3A Control 5* 


GCGACCGGTCXSCCGGACGC^ ctaaaataitt 


18.2 ex-TAG 




CFTR Exon 3A Control 3* 


G<^CCGGT(XKXXX3ACGCCtmcataatc acaaaaat 


15.8 ex-TAG 


10 


Common APC 


GCGACCGGTCGCCGGACGCC^aaataaaaga aaagattggaactaggt- 
cage 


34.3 ex-TAG 




Mutant-specific APC 


GCGACCGGTCGCCGGACGCCggctgattctg aagataaactagaacccga 


43.3 ex-TAG 



l5 Extension of Standard ARMS Test to include the detection of R553X and W1282X CF mutations :- The development of 

an ever ARMS test tor the simultaneous detection of the closely linked mutations G542X and G551 D is desenbed in our 

European Patent Application, publication no. 0497527. 

The incidence of the R553X mutation, also located in exon 1 1 and separated from G551D by only 5 base pairs is 

signif icant in CF affected individuate. As such, a method which would allow the simultaneous detection of all 3 mutations 
20 would prove valuable in determining CF earner status. The simultaneous detection of the G551 D and R553X mutations 

presents two adtfitional technical problems: 

i) direct competition of the G551 D and R553X primers for target genomic DNA (5 bp separation, therefore ARMS 
primers themselves overlap - not a problem observed in the case of G542XA3551 D overlapping ARMS) 
25 ii) the G551D and R553X mutant PCR products would be indistinguishable by size difference using 3% agarose 
gels. 

In an attempt to overcome the latter problem an elongated R553X mutant ARMS primer of 60 bp was synthesised 
(conventional ARMS primers are normally 20-30 bp) thereby creating a 39 bp size difference between the expected 

30 G551 D and R553X mutant product bands. 

Initially a 60 bp mutant ARMS primer (21 34) containing an additional G-G destabilising mismatch at the -2 position 
of the 3' end but otherwise totally homologous to target DNA sequence, was included in the Standard ARMS test 'A' 
reaction mix at 1jiM concentration. R553X mutant product was detected and the ARMS primer was specific for only 
mutant DNA sequence. 621+1 normal, DF normal and G542X mutant product bands were unaffected by inclusion of 

35 the R553X primer. However G551 D mutant product bands were no longer visible suggesting that the R553X mutant 
primer bound more effectively to target DNA thereby preventing any hybridisation of G551 D mutant primer. Any further 
destabilisation of the R553X mutant primer at the 3' end (to allow the G551 D mutant primer to bind target DNA also) 
was likely to compromise the yield of R553X PCR product. Likewise, reducing the severity of the G551 D mutant primer 
mis-match was likely to compromise specificity. Consequently, a second elongated R553X mutant ARMS primer (21 50) 

40 was synthesised which was no longer completely homologous to target DNA at the distal (5*) end. The primer was oth- 
erwise identical to the original 2134 primer at the proximal (3*) end and thus the ARMS specificity was unchanged. 

When the 2150 R553X mutant ARMS primer (5* non-homologous tail) was included in the Standard 'A' reaction nix 
both R553X and G551 D mutant products were detected i.e. the increased 5' destabilisation of the R553X mutant primer 
enabled the G551 D mutant primer to compete for target DNA. Again the 621+1 , DF508 and G542X product bands were 

45 unaffected. 

The revised Standard ARMS test allowing detection of R553X in addition to 621+1, G551D, G542X and DF508 
mutations has been tested with a number of positive control DNA samples (including and individual compound hetero- 
zygous for both R553X and G551 D) and correct diagnoses obtained. 

Although the R553X mutation could be easily detected using the method described above, the yield of R553X 
so mutant product was generally lower than that observed for the other A-mix PCR products. In order to increase the 
amount of R553X product, and thereby obtain an overall balanced A-mix band profile, several approaches were evalu- 
ated :- 

i) increasing R553X mutant primer concentration 
55 ii) reducing mis-match severity 

iii) adding a secondary TAG* primer specific for the 5'non homologous tail of the R553X mutant primer 

The first two approaches were unsuccesful. Increasing the 2150 primer concentration to 4fiM did not markedly 
increase product yield. Using an R553X mutant primer containing no additional 3' destabilising mis-match increased 
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product yield but was no longer specific lor R553X mutant target sequence (The effect of a G-T mismatch was not 
investigated). 

A TAG primer (30 mer, 2173) specific for the 2150 5' homologous tart was also included in the revised Standard 'A' 
reactiron mix and the yieJd of R553X mutant product compared with that obtained using revised Standard A' mix 
5 (primer 21 50) only. Inclusion of the TAG primer resulted in a marked increase in the amount of R553X product Further, 
increasing the TAG: A RMS-tail primer ratio appeared to increase product yield. The optimal result was achieved using 
2150 at 1jiM and the corresponding TAG primer at 3fiM. A second tailed ARMS primer (2180) identical to 2150 at the 
3* end but modified 5' sequence was employed in conjunction with TAG primer 2164 but this particular combination 
failed to produce any R553X mutant product 

10 

TABLE 1 





III rr AT1SMJ 




DDIUCD 

rnlMtn 


CDCfNciorrv 
oKblrlrlUI 1 Y 


U1C_ U AT^U 

Mio-MAI LrM 


15 


Wl cX5<LK 




<iU 1U 




r» a 
U-A 








2011 


M 


CA 








2013 


N 


G-A 








2012 


M 


G-A 


20 






2155 


N 


A-A 








2109 


M 


A-A 








2914 


C 




25 


1717-1 


INTRON 10 


2065 


N 


G-T 








2070 


M 


G-T 








2066 


N 


G-A 








2066 


M 


G-A 


30 






2067 


N 


G-G 








2069 


M 


G-G 








1823 


C 





35 



TABLE 2 



40 


MUTATION 


EXON 


PRIMER 


SPECIFICITY 


MIS-MATCH 


LENGTH 


5'HOMOL- 
OGY 


5-TAJLSEQ. 




R553X 


11 


2189 


M 


G-G 


30 






45 






2134 


M 


G-G 


60 


YES 








2150 


M 


G-G 


60 


NO 


TAG 1 








2172 


M 




60 


NO 


TAG 1 








2180 


M 


G-G 


60 


NO 


TAG 2 


50 


TAG 1 =2173 
TAG 2 = 2164 
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EXAMPLE 16 

Analysts of "null* repeat units 

5 Preparative MVR-PCR and Sequencing of Null Repeat Units :-! 00 ng samples of total DNA, or equivalent amounts 

of indrviduaJ alleles separted from Mbol digest of total genomic DNA by preparative gef eJetrophoresis, were amplified 
in 30 mJ reactions in the presence of 1 .5 units AmpliTaq (Peridn-Elmer-Cetus), using the primers and PGR buffer system 
described in the above Examples. Reactions were cycled for 1.3 rrin at 96°C. 1 min at 68°C, and 5 min at 70°C for 30 
cycles on a DNA Thermal Cyder (Perkin-Elmer-Cetus), followed by a 2 cycle chase of 1 min at 68°C, 10 min at 70°C. 

10 Amplified products were eiectrophoresed through a 1 . 1% agarose gel and visualized by ethidium bromide staining. An 
appropriate "rung" in the MVR-PCR ladder required for sequencing was excised from-the gel and purifed by electroe- 
lution onto darysis membrane. The purified PCR product was reampirfied with PCR primers 32 -O and TAG using the 
same cycfing conditions as before for a further 18 cycles. The double-stranded PCR product was re-purified by electro- 
phoresis and eletroelution and sequenced directly. 

15 MVR-PCR of null repeats :- this was performed using primer 32-TAG-N (5nM final concentration) or 32-TAG-J 
(10nM final concentration) instead of the a- type or t-type specific primers 32-TAG-A and 32-TAG-T (for primer 
sequences see Table 3). Other conditions and the driver primers used (32-D or 32-0 plus TAG) were as previously 
described. 

Sequence Analysis of MS32 Null Repeat Units :- we have previously descrtoed how haplotypic MVR maps from 
20 individual alleles can be determined, either from eietrophoreticalh/ -separated alleles or by pedigree analysis of digital 
codes generated by MVR-PCR from total genomic DNA From this survey, three individuals were chosen, each of whom 
had an MS32 allele containing one or more null or O-type repeat(s) within the first 20 repeat units. Separated alleles or 
total genomic DNA was amplified by MVR-PCR to the point where PCR products could be visualised directly on agar- 
ose gels by staining with ethidium bromide; up to 20 repeat rungs on the MVR ladder could be generated (data not 
25 shown). For separated alleles, the rung two repeat units above the null repeat was excised from the gel, re-amplified 
and sequenced. For total genomic DNA. a suitable band specific to the relevant allele was identified at an aA hetero- 
zygous rung position above the position of the O-type repeat, followed by purification and sequencing. 

The sequences of the three null repeats characterised are shown in Table 3. All three shared the same A base dele- 
tion 3bp 3' to the G/A polymorphic site which distinguishes a- and t repeat units. The null repeat unit sequences were 
30 otherwise normal and contained either G or A at the major polymorphic site. This single base deletion is sufficient to 
block priming by the MVR-PCR primers 32-TAG-A and 32-TAG-T; null repeats containing this variant are referred to as 
N-type repeats. 

MVR-PCR of N-type Repeats - to determine the frequency of N-type repeats in MS32 alleles, a new MVR-PCR 
primer, 32-TAG-N, was designed to prime specifically off these repeats. This primer incorporates the TAG sequence as 

35 previously described and can be used in MVR-PCR as a replacement for the a- or t-type specific primer (Table 3, Figure 
17). The majority of individuals previously identified as containing alleles with null repeats were remapped using 32- 
TAG-N (Figure 17). Most null repeats were positively identified by primer 32-TAG-N at the position previously identified 
from intensity differences in the A and T lanes (32-TAG-A. 32-TAG-T) as being heterozygous or homozygous for a null 
repeat. A minority of null repeat units failed to amplify with 32-TAG-N (Figure 17. individuals 4 and 5). indicating the 

40 presence of additional repeat unit variants) which could not be detected by primers 32-TAG-A. -T or -N. 

In a survey of the first 50 repeat units in 391 different Caucasian and Japanese alleles (1 8,790 repeat units in total). 
285 repeats were null or O-type repeats (1.5%) and 241 of these repeats were detected as N-type (Table 3). Thus 
84.5% of all null repeats were identified using the 32-TAG-N primer and therefore share the A deletion; the possibility 
of additional variation between N-type repeats which does not block priming by 32-TAG-N cannot however be excluded. 

45 The incidence of N-type repeats is very similar in Caucasians and Japanese (1 .39% and 1.26% of all repeats, respec- 
tively). 

Sequencing of One of the Minor Null Repeats :- In an attempt to characterise further the remaining null repeat units 
not detected by 32-TAG-N, a single repeat unit of this type was sequenced from a Japanese allele. This J-type repeat 
contained a C->T transition immediately 3' to the major G/A polymorphic site in an otherwise normal repeat unit 

so sequence (Table 3). A new PCR primer (32-TAG-J) designed to assay this sequence variant was tested on all DNA 
samples that contained null repeat units not detected by 32-TAG-N. Only 3 repeat units in 2 different Japanese alleles 
were detected with this primer (data not shown). The remaining null repeat units not detected by 32-TAG-N or 32-TAG- 
J are referred to as U type (undetectable) repeats and contain as yet uncharacterised repeat variant(s). The frequency 
of U-type repeats varies substatially between Caucasian and Japanese alleles (0.18% vs. 0.56% of all repeat units, 

55 respectively). 

Distribution of null repeats in MS32 alleles :- 23% of alleles (91 alleles out of 391 different Caucasian and Japanese 
alleles typed) contained one or more null repeats within the first 50 repeat un'rts. Null repeats appear to occur with equal 
likelihood at any position within the mapped region of these alleles (Figure 18A). Analysis of the number of null repeats 
in different alleles (Figure 18B) showed clear evidence of clustering of nulls, particularly N-type repeats, within a limited 
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number of alleles. In one extreme case. 12 N-type repeats were present within the first 50 repeats, and in another 
bizarre case, an allele contained a succession of 8 U-type repeats followed by NaNN embedded in an allele otherwise 
fixed for a-type repeats (Figure 18C) 

Although the vast majority of MS32 alleles so far typed have different MVR maps, different alleles can nevertheless 

5 show internal regions of significant map similarity suggesting recent common ancestry of these allele segments (Figure 
19). These shared haplotypic segments occur much more frequently at one end of MS32 alleles, distal to the unstable 
proximal region mapped by MVR-PCR which contains a localised mutation hot-spot. 59% of the 391 different MS32 
alleles so far mapped can be aligned into 40 different groups of related alleles. 77% of the 91 alleles that contain null 
repeat units fall within these aligned groupings. In every case where two or more alleles shared a null repeat at equfv- 

10 alent positions within the share hapfotype. MVR-PCR showed that the null repeats were of identical types (almost 
always N-type repeals) (Figure 19, groups A. B). AdrJtional N-type repeats restricted to just one of the alignable alleles 
almost always lay outside the shared haplotypic region. In contrast, U-type repeats tend to occur sparadically within oth- 
erwise preserved haptotypes shared by related alleles and are usually confined to only one of the aligned alleles (Figure 
19. group C). 

75 Effect of Null Repeats in Paternity Testing :- to use digital MVR codes from total genomic DNA for paternity analysis, 
it is necessary to identify correctly code positions heterozygous for null repeats (a/O. t/O). However, since null repeats 
are scarce, the presence of a null-containing paternal allele in a child will add substantially to the ability of MVR-PCR 
to exclude non-fathers of such a child. To estimate the overall effect of null repeat units in paternity testing, the MVR 
codes of 1 41 different Caucasian mother-child duos were each compared with 302 different unrelated Caucasians over 

20 the first 50 repeat units (45,582 different mother-child-norrfather trios in total). On average, 9.6 exclusions were 
obtained per comparison, of which 4.6 were patemal-specifc and the remainder directionally ambiguous (Figure 19A); 
97.5% of non-fathers showed at least one paternal -specific exclusion, and 99.86% showed one or more exclusions in 
total (paternal-spec rfic plus ambiguous). Since maternity is seldom an issue in paternity cases, then the first 50 repeats 
contain enough information to exclude on average 99.86% of non-fathers. To determine the contribution of null repeats 

25 to this efficiency, the simulated paternity cases were re-evaluated after elimination of all code positions in each child 
heterozygous for a null repeat (a/O. t/O). including both authentic null repeats and non-existent 'null' repeats from 
beyond the end of short alleles (Figure 19B). As expected, the mean number of exclusions fell signif icantly, causing a 
drop in the proportion of non-fathers showing exclusions from 99.86% to 99.50% (97.5% for paternal-specific exclu- 
sions only). 

30 These estimates for the efficiency of non-paternal exclusion are a mean over all mother-child duos. Variation 
between duos in levels of exclusion was therefore investigated (Figure 19C). The proportion of the 302 'non-fathers' 
excluded varied substantially from duo to duo, depending on the precise nature of the MVR codes in the mother and 
child. In the worst case, only 95% of non-fathers could be exdused (74% if only paternal-specific exclusions are used). 
As expected, these estimates are worsened if null repeat positions are eliminated from the analysis. 

35 Internal mapping of variant repeat units within minisatellrtes represents an important new approach both to DNA 
typing and to the analysis of allelic variability and minisateilite mutation processes. Work to date on minisateJIite MS32 
has concentrated on a G/A base substitutional polymorphism originally defined by the presence/absence of a Haelll 
cleavage site within repeat units. A second cornmon polymorphic site 2 bp from the variable G/A site has been found 
from sequence analysis of cloned MS32 (see Table 3) but has yet to be used for internal mapping. MVR-PCR has now 

40 revealed additional rare variants defined operationally as "null" repeats which cannot serve as priming sites for the 
MVR-PCR primers 32-TAG-A or -T. These variants have presumably arisen by repeat unit sequence mutation, and their 
incidence governed by a balance between mutation and fixation/extinction within and between repeat arrays by proc- 
esses such as unequal exchange and replication slippage. The relative scarcity of null repeals makes them particularly 
useful for identifying related alleles and confirming the authenticity of allele alignments. 

45 87% of null repeats in Caucasian alleles share a common variant, the N-type repeat, which can now be detected 
reliably by MVR-PCR. The widespread occurrence of N-type repeats in both Caucasian and Japanese alleles and their 
presence in amy groups of aligned alleles suggest that this variant arose fairly early in the evolution of MS32 alleles. 
Several different group of aligned alleles contain N-type repeats within a "NataNata" motif (Figure 19), suggesting a 
"supergroup" of alleles sharing homologous patches of tandem repeats within alleles which are otherwise not obviously 

50 alignable. 

The remaining null repeats include the rare J-type repeat and the as yet unsequenced U-type repeats. 8% of alleles 
contain U-type repeats, and the majority of these alleles (26/33) have only a single U variant over the region mapped, 
suggesting recent mutation without subsequent diffusion into neighbouring repeats. This is supported by U-containing 
alleles which fall within groups of alignable alleles; in each of the five cases where the U-type repeat lay within a haplo- 
55 typic segment shared by several alleles, other alleles contained an a- or t-type repeat at the corresponding position (see 
Figure 19, group C). This suggests either very recent repeat unit mutation from a or t to U p and thus sequence hetero- 
geneity amongst different U-type repeats, or possibly that the U repeat is ancestral within a group of aligned alleles and 
has recently been replaced by an a- or t-type repeat by a process such as microconversion which does not affect repeat 
unit copy number or the flanking MVR map. Two probable instances of U-type repeat cfiffusion subsequent to mutation 



27 



EP0 731 177 A2 



have been found. In one Japanese allele with two U repeats, the variants are contained within a perfect high-order tan- 
dem repeat of a 24 repeat unit segment commencing 3 repeat units from the beginning of the allele, and presumably 
contain the same variant (not shown). In the second case, an English allele contains a block of 8 U-type repeats (Figure 
18C) which again have presumably expanded from a single mutant repeat. 

The existence of variant repeats with abnormal repeat length, for example the N-type repeat 28bp rather than 29bp 
long, could create problems in digital coding from genomic DNA, by moving the MVR ladders of each a Dele out of reg- 
ister. In practice however, aberrant length repeats do not appear to present a significant problem; in the worst individual 
so far found with 12 N-type repeats in one allele and none in the other, the digital code could be unambiguously read 
for more than 50 repeat unit positions, although the normally perfect spacing of rungs on the ladder was slightly per- 
turbed by the progressive misalignment of the two allele ladders (maximum misalignment of 1 2bp for the 50 repeat unit 
PCR products 1714bp long) (data not shown). 

MVR-PCR can ateo be used for paternity testing, provided that heterozygous null positions (a/O. t/O) in diploid 
codes (3.0% of all positions) can reliably identified Experience to date suggests that these positions can be identified 
with >90% reliability solely from band intensity information using primers 32-TAG-A and -T alone. The ability to detect 
definitively the substantial majority of null repeats using primers 32-TAG-N and -J substantially increases the reliability, 
to provide a single locus which is remarkably effective at excluding non-fathers, though limited by the high de novo 
mutation rate creating new MVR haplotypes at MS32. 



TABLE 3 



Sequence of MS32 variant repeat units and their distribution in the 
first 50 repeat units of Caucasian and Japanese alleles. 



Repeat 
type 


Sequence 


Caucasian(%) 
(n = 15536) a 


Japanese<%) 
(n = 2868) a 


a 


5'-GGCCAGGGGTGACT- 
CAGAATGGAGCAGGY-3' 


73.5 


75.1 


t 


5-GACCAGGGGTGACT- 
CAGAATGGAGCAGGY-3' 


25.0 


22.9 


N 


5'-GRCC-GGGGTGACTCA- 
GAATGGACGAGGY-3' 


1.26 


1.39 


J 


5*-GGTCAGGGGTGACT- 
CAGAATGGAGCAGGY-3* 


0 


0.07 


U 


unknown 


0.18 


0.56 



Y=C or T. R=G or A. a-, t-. N-, and J-type repeat units were detected by the 
following MVR-specific primers: 



32-TAG-A. 5 , -tcatgcgtcrato^cxg^CATTCTGAGTCACCCCTGGC-3 , ; 
32-TAG-T. 5-tcatgco^ccatggtccg^ 

32-TAG-N, 5 , -tcatg^ccatg^ccg^aTCCATTCTG^TCACC^CX3G-3 , ; 
32-TAG-J. S'-tcatg/xjrtccatggt^ 



The 3'sequence of each primer (uppercase) is complementary to each repeat unit variant and is preceded by a 
common TAG sequence (lowercase) used to drive subsequent amplification. U type repeat units are not amplified by 
any of these MVR-specific primers, a, number of repeat units scored in 324 Caucasian alleles and 59 Japanese alleles. 

EXAMPLE 17 

All ele 'knoc k out' MVR-PCR 

The observed variation seen for diploid codes (no two individuals among 408 unrelated Caucasians and Japanese 
so far typed share the same diploid code) is based directly upon the massive variation of individual alleles. The estimate 
for the minimum number od distinguishable alleles present in current Caucasian populations is around 3500 (estimated 
from the number of different alleles observed in a sample of 337 mapped alleles, which contained 326 different alleles 
of which 316 were only sampled once). The true number of different alleles is certainly in excess of this and may be as 
high as 10 8 for the total world population (based on known mutation rate and population size. Jeffreys et al., 1991). 
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Allele mapping is providing remarkable insights into the evolution of minrsateilites and the generation of new length alle- 
les with, for the first time, preliminary evidence for the role of unequal interailelic exchange, or interallelic gene conver- 
sion, in the genertion of new mutant alleles (Jeffreys et aJ , 1991). We also predict that allele mapping will prove an 
invaluable tool in the analysis of human population divergence through the generation of allele groupings from which it 

5 may prove possible to derive both allele and human population lineages. 

The structure of ircfividuaJ MS32 alleles using MVR-PCR can be presently approached in two ways: First, by ped- 
igree analysis of cfipk*d codings and second, by mapping of individual separated alleles. Using family groups it is pos- 
sible to derive incomplete allele maps from father, mother, single child trios and total unambiguous allele maps from 
father, mother and two children who share one allele in common. The use of such family groupings is however limited 

w by avalability and by the high dfi novo mutation rate of 1 .2% per gamete at this locus. Alternatively, individual alleles 
from one person may be separated on the basis of size, using restriction digestion and preparative agarose gel electro- 
phoresis. This approach is time consuming, tedious and requires reasonbly large amounts of DNA (miniumum around 
5 jig total genomic DNA) plus the need for a preliminary experiment to determine allele sizes. Moreover this approach 
proves difficult for inrJviduals with closely sized alleles and pseudohomozygous individuals. Some of these problems 

15 may be obviated by single molecule dilution (SMD) and PCR recovery (Monckton and Jeffreys. 1991). but this proce- 
dure has its own limitations, the main one being that it is applicable only to relatively small alleles that may be amplified 
in their entirety. 

We used single stranded conformational pcrfymorphism (SSCP) analysis, DNA sequencing and inter-species 
sequence comparisons to identify three common poryrmrphisms in the flanking DNA of MS32. The sequence infer ma- 
20 tfon thus gained was used to design PCR based diagnostic tests for allelic state and. through the use of allele specific 
primers, hapfotype specific MVR-PCR of MS32 alleles in heterozygous individuals (i.e. knockout' of one allele). We 
also show that the use of haplotypic primers may be used to obtain unambiguous individual specific diploid codes, or 
unambiguous single allele codes, from mixed DNA samples, of obvious potential in forensic applications. 

25 Materials and methods 

General PCR assays :- PCR was performed using the buffer conditions and primer sequences and concentrations 
previously described (Jeffreys et al.. 1991 ; legend to Figure 2). and with the primer sequences as given in the legend 
to Figure 21 , using 100 ng of input genomic DNA in 7.5 jil reactions, unless stated otherwise. Cycling conditions were 
30 1 minute denaturation at 96°C, 1 minute primer annealing at A°C and E minutes extension at 70°C. 

DNA sequencing :- Single stranded template DNA was generated by asymmetric PCR (Gytlensten and Ertich, 
1988) and sequenced in the presence of the detergent NP-40 by the di-deoxy chain termination method as previously 
descrfoed (Bachman et al., 1990) using T7 polymerase (Pharmacia). 

MVR-PCR :- this was earned out with the fixed flanking primers 32-0, 32-H2C or 32-D2 using an annealing temper- 
as ature (A) of 69°C and an extension time (E) of 5 minutes for 1 8 cycles, with all other procedures as previously described 
(Jeffreys et al., 1991). Knockout MVR-PCR using the flanking primer 32-H1 C was performed with an annealing temper- 
ature (A) of 64°C for five cycles and 60°C for 13 cycles, again with all other procedures as previously described. 

Results :- MS32 MVR-PCR analysis is directed from a unique sequence primer (32-0, 32-D or 32-B) located in the 
5' flanking sequence of MS32 into the minisateJIite array (Figure 21). The original X clone containing MS32 includes only 
40 a further 425 bp of DNA 5' to the first minisatellite repeat unit. This region was previously sequenced in the human clone 
(Wong ej at 1987) and partially sequenced in a selection of primates (Gray and Jeffreys 1991). To search for polymor- 
phisms in this region in humans, primer 32-OR was designed and used in conjunction with 32-B (Figures 21 and 22) to 
amplify the 348 bp of DNA immediately flanking the most variable and unstable end of the minisatellite which is analy- 
sed in MVR-PCR. 

45 Identification of three common polyrrxxphisms in the flanking DNA :- PCR amplification followed by restriction 
digest analysis of this region from 12 unrelated Caucasian individuals revealed a Hint I restriction site dimorphism in this 
region (designated as Hf* for presence of the Hinfl restriction site and Hf for absence of the Hint I restriction site). This 
region showed no pofymorphisms using the restiction enzymes Bgll, Ddel, Fnu4HI or Alul in the same 12 unrelated indi- 
viduals. Direct DNA sequencing of this region from PCR amplification products from a heterozygous individual (HFVHf 

so ) and a single molecule separated Hf allele from a second heterozygous individual (Monckton and Jeffreys. 1991) 
revealed the polymorphism as a C (presence of Hinfl restriction site, Hf*) to T transition (absence of Hinfl restriction she. 
Hf) at position 143 (Figure 23. Table 4). 

PCR-SSCP analysis (Orita et al.. 1989) of the entire flanking region (32-OR to 32-B) in 8 CEPH parents 
homozygous for HT revealed another common polymorphism (see segregation analysis of family 1416. Figure 22A). 

55 Direct DNA sequencing of the PCR product amplified from individuals homozygous for the two forms and their hetero- 
zygous father showed the polymorphism to be due to a C to G transversion at position 80, designated Hump 1 (HUMan 
Primate variant 1 alleles H1 c and H1 0 ). Figure 23, Table 4). Further sequence comparisons between the sequences 
obtained here and those obtained previously revealed another polymorphic site within this flanking region, a C to T tran- 
sition at position 241, designated Hump2 ( HUMa n Primate variant 2, alleles H2 C and H2 T ). (Figure 23, Table 4). 
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Primate sequence comparisons :- Direct comparisons of the flanking region between the cloned human sequence 
(Wong et aJ.. 1987) and those previously obtained for Chimp. Gorilla and Orang-utans (Gray and Jeffreys. 1991) 
allowed the derivation of a great ape/human ancestral sequence for this region (using Orang-utans as the outgroup), 
(Figure 23). Nine sites of sequence divergence exist between man and the primate ancestral sequence and all three of 

5 the described polymorphic sites so tar identified are contained within this group (Table 4). We reasoned that the 
observed differences between the cloned human sequence and the derived ancestral sequence were likely to be due 
to mutation events that occurred subsequent to the human - great ape spirt, approximately 6-8 million years ago (Koop 
et al., 1986). We further reasoned, assuming a fixation time of around 1 million years and a random timing for the gen- 
eration of new alleles within that 6-8 million years, that approximately 1 17 (i.e. 1 -2) of the sites would have arisen in the 

io past 1 million years, thus would be unfikely to have progressed to fixation and could still be polymorphic within the 
present human population. This type of analysis not only produces an estimate for the number of likely polymorphic 
sites but also direct information as to their probable location. Most significantly it allows the predication of easily assay- 
able restriction enzyme sites that differ between the human clone and the primate consensus. Obviously the success 
of this approach is highly dependent on the initial human sequence obtained, since if the chromosome from which the 

75 human sequence was gained carries the ancestral allele at a genuinely polymorphic site then such a site will not be 
identified by this type of analysis. Of the nine sites of sequence divergence identified, six produced changes in com- 
monly available restriction enzyme sites (Bo) I, BspMI, Hinfl and Xbal, see Table 4) and all wer assayed in 20 unrelated 
individuals amplified with primer pair 32-OR and 32-B. Other than the previously identified Hinfl polymorphic none of 
the six sites examined were found to be commonly pdyrnorphic. The base substitutions at sites 80, 94 and 241 do not 

20 affect recognition sequences for any commonly available restriction enzymes. Sites 80 and 241 were previously shown 
to be polymorphic by SSCP and sequence analysis (H unrip 1 and Hump2), whilst sequence analysis of seven amplified 
human alleles and the human clone has not revealed the persistence of the ancestral allele at position 94. 

Assays for the polymorphisms and heterozygosity analysis: -As a simple restriction site dimorphism the Hf polymor- 
phism was very easily typed by standard PCR amplification (using primer pair 320R and 32-B) and subsequent Hinfl 

25 digestion (Figure 22B). Typing of this polymorphism across the 80 parents in the CEPH panel of families and across 
101 unrelated Japanese incfividuals showed a heterzygosity level of 31% for this polymorphism in both populations 
(Table 22). 

Unfortunately neither Humpl nor Hump2 created or destroyed restriction enzyme sites within the flanking region 
and thus an alternative approach to determining allele status at these polymorphic sites was required. For Hump2 a sin- 

30 gle tube four primer PCR assay was developed (Figure 22C) . Two opposing primers specific for the two alternative alle- 
les were created. 32-H2C (for amplification from the H2 C allele) and 32-H2AR (for amplification from the H2 T allele), 
and used in conjunction with the universal primers 32-OR and 32-B (see Figure 21). An individual homozygous from 
the H2 T allele produces a 259bp band corresponding to the PCR product from primer pair 32-H2AR and 32-B. as well 
as the 364 bp internal control band derived from the universal primers 32-OR and 32-B. In contrast an individual 

35 homozygous for the H2 C allele produces a 142 bp band corresponding to the PCR product from primer pair 32-H2C 
and 32-OR, as well as the 364 bp internal control band. Heterozygous individuals (H2 C /H2 T ) produce all three bands. 
Typing of the Hump2 polymorphism across the 80 parents in the CEPH patent of families and across 101 unrelated Jap- 
anese individuals showed heterzygosity levels of 48% and 16% respectively (Table 5). 

Unfortunately the Humpl pofyinorphic site lies in a very AfT rich region of DNA (26% G/C in the 50 bp surrounding 

40 Humpl ) and an alternative strategy was required to assay this site. The mismatched primer 32-H1 B primer just 5* to the 
Hump 1 polymorphism and forces the incorporation of a 3'terminal G rather than the A present in genomic DNA. Use 
of this primer during low stringency PCR allows incorporation of this transition into resulting PCR products. This forced 
insertional mutation creates or destroys an easily assayable Bsp12B6l restriction enzyme site dependent on allelic 
state at the Humpl focus (H1 G derived products amplified with 32-H1B are cut by Bsp1286l). Unfortunately the low 

45 annealing temperture required to ensure the A/T rich 32-H1 B primer incorporating the terminal mismatch primes effi- 
ciently prevented the direct use of total genomic DNA as a PCR template. Thus a preliminary amplification of the entire 
flanking region with primer pair 32-OR plus 32-B (as used in the Hf assay) was required to generate seed DNA for use 
in a nested 32-NR (32-NR primes just 5* of 32-OR and acts as a nested primer directed into the 5* flanking DNA of 
MS32) to 32-H1B amplification (Figure 21). Simple genotyping of this polymorphism was then achieved by Bsp1286l 

so digestion and agarose gel electrophoresis (Figure 22A). Typing of the Humpl polymorphism across 40 parents irrthe 
CEPH panel of families showed a heterozygosity of 43% (Table 5). 

Knockout MVR-PCR :- These flanking polymorphisms can be used to map individual MS32 alleles from total 
genomic diploid DNA by the use of allele specific primers located in the flanking DNA PCR primer 32-D2 spans the site 
of the Hf polymorphism and was used as an allele specific MVR-PCR primer. Using 32-D2 as the fixed primer in the 

55 f linking DNA it was possible to amplify only MS32 alleles linked to the Hf* site i.e. to 'knockout' the amplification of the 
Hf" linked allele. For heterzygous individuals (HfTHf) it was possible to obtain the allele map from the Hf + linked allele 
direct from total genomic DNA (using primer 32-D2) and for the Hf" linked allele by substraction of the HT allele from 
the diploid code derived from a standard MS32 MVR-PCR using a universal flanking primer (32-D. 32-0 or 32-B) (Fig- 
ure 24). 
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PCR primer 32-H2C can also be used as an allele specific MVR-PCR primer; using this as the fixed primer in the 
flanking DNA it is possible to knockout H2 T linked alleles and amplify only MS32 alleles linked to H2 C As with the Hf 
polymorphism in heterozygous individuals (H2 C /H2 T ) it is possible to obtain the allele map from the H2 C linked allele 
direct from total gerxxnc DNA (using primer 32-H2C) and for the H2 T linked allele by substraction of the H2° allele from 

s a standard MS32 MVR-PCR (Figure 24). Similarly the Humpl specific primer 32-H1 C may also be used for knockout 
MVR-PCR in heterozygous incfividuate, (Figure 24). 

Haptotype analysis of flanking DNA porymorDhism :- Haplotypic analysis of the polymorphisms to each other and to 
the minisatellite alleles may be achieved in a variety of ways. Pedigree analysis is the simplest, and has been applied 
to the three flanking polymorphisms and the minisatellite array for 40 CEPH families. Haplotypes of each flanking poiy- 

io moronism with respect to the minisatellite array can be directty achieved by knockout MVR-PCR. PCR based systems 
for drect haptotype analysis and detailed haptotype studies are on-going. Preliminary results however, suggest that sig- 
nificant linkage disequifibrium exists between all the polymorphic sites, but in no case is the observed disequilibrium 
absolute. Results for the haptotyping of the Hump2 and Hf polymorphisms in the parents of the 40 CEPH families are 
presented in Table 5. Based on these figures approximately 63% of Caucasian individuals are heterozygous at the var- 

75 iant Hf and/or Hump2 sites and can therefore have single alleles mapped by knockout MVR-PCR. 

Applications to mixed DNA samples :- As described above MS32 MVR-PCR is likely to have major applications in 
forensic science, an application for which mixed DNA samples are often encountered e.g. mixed victim and assailants 
blood in violent attacks, vaginal swabs in rape cases and mixed semen samples in multiple rape cases, or mixed part- 
ner/rapist semen samples. We have shown above that ambiguous diploid codes may be derived from mixtures of DNA 

20 down to approximately 1 0% admixture, and that in cases where a pure sample of one-of the DNAs, e.g. victim, is avail- 
able a high level of exclusionary power is achieved (on average 99.9993% of false suspects excluded). Even in cases 
were neither sample is available in a pure form valuable information to exclude false suspect may still be derived. How- 
ever, mixtures of DNA below 10% and mixtures of two or more DNAs are less amenable to standard MVR-PCR analy- 
sis. To investigate the potential forensic applications to knockout MVR-PCR we simulated mixed DNA cases under two, 

25 of many, posstte scenarios: 1 . the mixture of a H2 C homozygous assailand with H2 T homozygous victim, allowing use 
of the H2 C specific primer 32-H2C to selectively amplify only the assailant's alleles, thus deriving the assailants diploid 
code; and 2, the mixture of a Hf + /Hf heterozygous assailant with a homozygous Hf victim allowing use of the Hf* spe- 
cific primer 32-D2 to specifically amplify only one of the assailant's alleles. Two suitable individuals were identified i.e. 
a H2 T /Hf T . Hf/Hf Victim* and a H2 C /H2 C . HfTHf 'assailant' and DNA mixture from 1:1 to 1 .200 (victim: assailant) made 

30 and MVR-PCR analysis performed with the appropriate primer combinations (see Figure 25). Allele specific primer 32- 
H2C can be used to amplify unambiguously only the assilanf s alleles down to mixtures of at least 1 : 1 0 (1 50 ug of victim 
DNA. 15 ng of assailant DNA). For the 1 :50 mixture only the assailants diploid code was seen, but some variation in 
band intensity was observed as the tower limit for the quantity of input DNA was approached (only 3 ng of specific input 
DNA). Below 1 50 mixtures extra cycles of PCR were required to maintain detectable levels of product, with resulting 

35 increased background signal derived from mispriming from the victim s DNA; as a consequence unambiguous informa- 
tion was no longer derived. Nevertheless it may be possible to derive an ambiguous code at mixtures far lower than pos- 
sible using standard MVR-PCR, especially if enough material is available to permit multiple amplifications allowing 
derivation of a consensus code if stochastic loss of PCR products is observed for very small starting amounts of DNA. 
The Hf* specific primer (32-D2) shows less allele specificity than the Hump2 allele specific primer, but it does allow the 

40 derivation of single allele codes for mixtures down to 1 2. Primer 32-D2 was not initially designed as an allele specific 
primer but fortuitously spanned the Hf polymorphism. An alternative primer designed specifically to access the Hf pol- 
ymorphism should amplify more selectively and allow derivation of unambiguous codes at lower levels than achievable 
with 32-D2. 

The power of using single allele codes to identify individual based on comparisons with their diploid MVR-PCR 
45 code was ateo assessed. Each of the 41 1 different alleles in our present allele database were used to screen the diploid 
database of 408 unrelated individual codes; the number of exclusions per false suspect is plotted in Figure 26. 99.87% 
of false suspects were excluded using information from the first 50 repeats, with a mean of 10.7 exclusions per case. 
However, many of the alleles in our database were derived from mother-father-single child trios and thus contain some 
ambiguous positions; this situation does not accurately reflect the circumstances likely to arise in genuine forensic 
so applications where the code of the allele under test will have been generated unambiguously by knockout MVR-PCR. 
We therefore repeated this analysis using 235 completely mapped alleles and. as expected, the level of exclusion rose 
slightly to 99.9%, with a mean of 1 1 .3 exclusions per case. The power of exclusion for any one allele though was not 
uniform with the majority of alleles excluding all false suspects (96. 1 1% and 96.60% respectively for total and unambig- 
uous allele databases), with the major loss in overall exclusionary power being due to a limited subset of alleles with 
55 poor discriminatory power. Those alleles which failed to exclude greater than 99% of false suspects were found upon 
examination to be 'a* rich homogeneous alleles (i.e. almost completely comprised of a-type repeats, data not shown). 
Nonetheless, even the worst unambiguous allele still managed to exclude greater than 95% of false suspects, an 
exceptionally high level for the worst case scenario of one allele of one locus. In summary more than 98.5% of single 
alleles exclude more than 99% of false suspects. 
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Using the Hf and Hump2 haptotype frequencies derived from the analysis of the 40 CEPH families (1 60 hapJotypes) 
an approximate estimate for the number of mixed DMA samples to which unambiguous diploid or single allele mapping 
could theoretically be appfied using the Hf and Hump2 Discriminatory system can be calculated (see Table 7; this anal- 
ysis assumes the mixes are of sufficient quality and in reasonable proportions to allow unambiguous MVR-PCR to be 

5 performed). It can be seen that in approximately 25% of cases an unambiguous diploid code would be derivable from 
a mixed ONA sample, and in up to 50% of cases either diploid code, or single allele, information would be recoverable. 
Use of the Humpt p^yrnorphism in this type of analysis should further improve the proportion of mixed DNA scenarios 
to which MVR-PCR based identification could be applied. 

Thus far we have identified three common polyrrKxphtsms in the immediate 348 bp of DNA flanking the minisatellite 

io locus D1 S8. For each poryrnorphic site we have developed rapid and reliable PCR based tests for allelic state and have 
determined allele frequencies in two major populations. Each locus appear to be a Hardy-Weinberg equiHxium, whilst 
significant but not absolute, linkage disequi&rium exists between sites. The use of such polymorphic sites to design 
allele specific primers has been demonstrated as well as their use in single allele or knockout MVR-PCR. With a com- 
bined heterozygosity in the flanking DNA of in excess of 63%, the large scale mapping of separate alleles in large num- 

15 bers of unrelated individuals becomes feasfcle, with obvious potential for the generation of large allele databases, allele 
groupings and possUe derivation of allele and human lineages. Mapping of more alleles and concurrent haplotyping of 
the flanking polymorphisms should shed more light on the mutation processes involved in maintaining urtravariabilrty at 
this locus. It will also help to assess the extent to which interallelic exchange is involved in the generation of new alleles, 
and to determine whether or not a local recombinational hotspot is indeed present at this locus. The identification of 

20 additional porymorphisms in the flanking DNA of MS32 will further increase the proportion of individuals heterozygous 
for at least one of the flanking sites, increasing both the number of single allele maps directly obtainable and providing 
more flanking DNA markers for the detailed analysis of the molecular processes operating at this hyperrnutable locus. 

The existence of addrtional unknown flanking polymorphisms which affect 'universal' flanking primers (32-0, 32-D 
and 32-B) could lead to inadvertent allele knockout during MVR-PCR (as originally found for the flanking primer 32-D2) 

25 and the generation of incorrect diploid code. However, such knockout of an allele will produce an apparently 
homozygous pattern devoid of heterozygous (aA) positions; such patterns are easily identified and such apparently 
homozygous individuals can be ret est ed with other flanking primers to check for true homozygosity (or possibly heter- 
ozygosity for a nun MS32 allele carrying a deletion of flanking DNA and flanking primer sites, though no such allele has 
been identified). 

30 A preliminary study of the potential forensic applications of knockout MVR-PCR in analysing mixed DNA samples 
has also been described, although a more rigourous and extensive study is needed to confirm the full scope of such 
applications. The optimization of PCR primer allele specificity and the characterisation of additional pdymorphisms 
should increase the proportion of mixed DNA smaples to which MVR-PCR can be applied. The application of knockout 
MVR-PCR to multiple mixed DNAs has not be tested directly, but they too should prove tractable, although the proba- 

35 bilrty of obtaining unambiguous codes will decrease as the number of DNAs involved increases. Knockout MVR-PCR 
under some circumstances can be used to obtain information for mixtures containing as little as 1% admixture of DNA; 
this represents a considerable improvement over other techniques such as Southern blot hybridisation using single 
locus hypervariable probes. Mixed DNA samples also occur in analytical contexts ether than forensics, e.g. monitoring 
of transplant success in bone-marrow transplants, and such situations should prove amenable to the same techniques. 

40 We have also investigated the potential use of primate consensus sequences to pin-point sites of potential variation 
in present day human populations. Although unsuccessful in further increasing the number of polymorphic sites found 
in this investigation, an initial analysis would have identified the three sites now known to be polymorphic in this region. 
We note that where primate sequence information already exists it may be used to more rapidly target potentially poly- 
morphic sites in humans. 

45 
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Table 4 



Human/primate ancestral sequence variant sites in the MS32 f tanking region 



No. 


Position 


Human 
Clone 


Human/ Ances- 
tral Sequence 


! Human 
Clone/Ancestral 
Restriction Site 
Differences 


Poiymor- 
phism in 
Caucasians 


Polymor- 
phic Locus 
Name 


1 


80 


C 


G 


none 


+ 


Hurnpl 


2 


94 


G 


A 


none 






3 


127 


C 


A 


Xbal+/- 






4 


143 


C 


T 


Hinfl+/- 


+ 


Hf 


5 


197 


G 


T 


Hinfl+A 






6 


207 


A 


G 


BspMI-/+ 






7 


241 


C 


T 


none 


+ 


Hump2 


8 


309 


G 


A 


Bgll+Z- 






9 


319 


C 


T 


Bgll+/- 







*Onfy 7 chromosomes have been analysed tor this locus. 
25 Tables 



MS32 flanking polymorphism allele frequencies 


Locus 


Allele 


Caucasian 


Japanese 






Frequency 


Number 


Frequency 


Number 


Hump 1 


G 


0.69 


55 


ND 


ND 




C 


0.31 


25 


ND 


ND 


HF 


+ 


0.81 


129 


0.81 


163 






0.19 


31 


0.19 


39 


Hump 2 


C 


0.59 


94 


0.91 


184 




T 


0.41 


66 


0.09 


18 


ND = Not done 



Table 6 



Caucasian haptotype frequencies for 
the Hf and Hump2 porymorphisms 


Hf-Hump2 
Haplotype 


Frequency 


Observed 
Number 


+ C 


0.54 


86 


-C 


0.05 


8 


+ T 


0.27 


43 


-T 


0.14 


23 



*X 2 (1df) = 17.85, a signiicant deviation 
from a null hypothesis of random associa- 
tion. 
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Table 7 

Theoretical estimation of the level of information obtainable from mixed DMA samples 
using the Hf and Hump2 allele specific primers in NVR-PCR 











Assailant's Haptotypes 


10 








+C 


+C 


+c 


+r 

TV 








+ 1 


+ 1 


T 












_r» 


« T 
+ 1 


• i 


p 


+ I 


T 


+ 1 


T 
" 1 


T 
" 1 








f% 


9ft Q 


S d 


5>A Q 




n 

u.o 




1 A 


7 o 

i .d 


7 7 
/. / 


d.\ 




V 


+C 


28.9 


n 
u 


1 
I 


I 


1 
I 


d 


d 


d 


d 




2* 


15 






























+C 




A A 

o.4 




o.4 


4.d 


A 1 


A O 
U.O 


A Jl 

U.4 




d.d 


0.6 




i 


+C 


5.4 


A 
U 


A 
U 


1 


1 


A 

u 


1 


1 


O* 

d 


2 


2* 






-C 




l.O 


U.o 


1 A 
1 .D 


A Q 

U.O 


u 


A -t 
0.1 


A 4 
0.1 


A ii 

0.4 


0.4 


0.1 


20 


c 


+C 


28.9 


n 
u 


1 


U 


1 


o* 

d 


1 


d 


0 


1 


"2* 






-T 




ft A 
o.4 


l.O 


o.4 


A ^ 
4.0 


A 1 

U. I 


A Q 
U.O 


A >l 

U.4 




2.2 


0.6 




t 


+C 


15.5 


A 

u 


A 

u 


A 

u 


U 


U 


U 


0 


0 


0 


0 


25 




-T 




4.5 


A O 

0.8 


4.5 


2.4 


0 


0.4 


0.2 


1.1 


1.2 


0.3 




i 


-C 


0.3 


d 


1 




Z 


0 


1 


1 


2* 


2 


2* 






-C 




0.1 


0 


0.1 


0 


o 


o 


o 


o 


o 


o 




M 


-c 


2.7 


0 


0 


0 


0 


0 


0 


0 


0 


0 


0 


30 




+T 




0.8 


0.1 


0.8 


0.4 


0 


0.1 


0 


0.2 


0.2 


0.1 






-C 


1.4 


2* 


1 


2* 


1 


0 


1 


0 


2* 


1 


0 






-T 




0.4 


0.1 


0.4 


0.2 


0 


0 


0 


0.1 


0.1 


0 


35 




+T 


7.2 


2- 


2 


1 


2 


2* 


1 


2 


0 


1 


2* 






+T 




2.1 


0.4 


2.1 


1.1 


0 


0.2 


0.1 


0.5 


0.6 


0.1 








7.7 


2. 


2. 


1 


1 


2. 


1 


1 


0 


0 


0 






-T 




2.2 


0.4 


2.2 


1.2 


0 


0.2 


0.1 


0.6 


0.6 


0.2 


40 




-T 


2.1 


2* 


2 


2 


1 


2* 


2 


1 


2* 


1 


0 






-T 




0.6 


0.1 


0.6 


0.3 


0 


0.1 


0 


0.1 


0.2 


0 



Notes: The upper figure is the number of assailant's alleles tor which code could be derived 
(where information on both alleles is recoverable = at least one allele separately recovera- 
ble) and the lower figure is an estimate for the percent likelihood of this scenario being 
encountered. Haptotype frequencies (f) are based on a sample of 160 Caucasian chromo- 
somes (Table 3). 
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EXAMPLE 18 

MVR-PCR analysis of MS31A 

The D7S21 locus (MS31A) is a minisatellite with an allele size range of 2-1 3Kb. It exhibits very high (99%) allele 
length heterozygosity that reflects extreme variability in tandem repeat copy number. Sequence analysis of MS31 A alle- 
les reveals that, like most minisatellites. there are polymorphic positions within the repeat units generating minisatellite 
variants repeat units (MVRs). However, MS31A is atypical in that an repeat units so far characterised at this locus have 
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the same length (20bp) (Wong et al.. 1987). These attrtoutes suggested that MS31A would be an ideal candidate for 
internal mapping of minisatellrte repeat unit variation by applying the same MVR-PCR technique as used for D1S8 
(MS32) (Jeffreys et al., 1991). 

An MS31 allele, cloned from a X fibrary of size fractioned human genomic DNA, has been sequenced. The Sau3AI 

5 fragment has 402 bp of 5* flanking DNA. followed by a large number of minisatellrte repeats (MS31 A) which are sepa- 
rated by 15bp from an adjacent minisatellrte (MS31B). The latter is truncated in cloned MS31 due to the presence of a 
Sau3AI site in one of Hs repeat units (Armour et al.. 1989). Almost all of the variability at MS31 is due to repeat copy 
number variation at MS31A. Sequence analysis of cloned M S3 1 A has revealed two adjacent sites of base substitutional 
polymorphism (G/A followed by C/T) in its repeat units. The second of these is potentially more informative for MVR- 

w PCR. since MS31A alleles contain roughly even numbers of these two types (C/T) of repeat units. The map of the MS31 
locus (Rg. 27) indicates that one end of MS31 A alleles is far more amenable to MVR mapping than the other, and that 
the C/T polymorphism should be directty accessible for analysis by mapping from this 5 end. The proximity of MS31 B 
to MS31 A would make it difficult to design flanking PCR primers complementary to the 3* end of MS31A. Furthermore,* 
access to the more informative variant repeat unit polymorphism would require the design of degenerate MVR-PCR 

75 primers spanning the less informative variant position within the repeats. Another advantage of assaying internal repeat 
unit variation at the 5* end of MS31 is that the existence of polymorphic sites in the flanking DNA can be exploited in the 
design of allele specific flanking primers. One such site, generating an Alul RFLP (Fig. 27) originally identified by 
Souther blot analysis, has been used in this way. 

Methods :- All PCR reactions used the buffer system described previously (Jeffreys et al.. Cell, 1990. 473-485). 

20 MS31 MVR-PCR was performed as follows. 50-1 00 ng of genomic DNA, or the equivalent quantity of a single MS31 
allele separated from an Mbol digest of genomic DNA by preparative gel electrophoresis, was used as the template in 
7uJ MVR-PCR reactions using the primers 31 A and Tag at a concentration of 1^M plus either 40jiM 31 -Tag- A or 20nM 
31-Tag-G and 0.25 units AmpliTaq -(Perkin-Eimer-Cetus). Amplification was carried out by denaturing at 96°C for 1.3 
min followed by annealing at 69°C for 1 min and extension at 70° C for 3 min, repeated for 22 cycles and followed a 

25 chase of 67°C for 1 min and 70°C for 10 min. PCR products and 1pg OX174 DNA x Ha el II size markers were electro- 
phoresed through a 35cm long 1 .2% agarose (Sigma Type 1) gel in 89mM Tris-borate (pH 8.3). 2mM EDTA. 0.5 fig/ml 
ethidium bromide (TBE), until the 1 18bp marker band reached the end of the gel. The gel was then Southern blotted for 
2 hours uing Hybond-N FP (Amersham) hybridisation transfer membrane. The membrane was dried and the DNA 
crosslinked to it by exposure to UV radiation from a transilluminator for 40 sees. It was then prehybridrsed at 65°C for 

30 30 mins. in 20 ml 0.5M Na phosphate (pH 7.2), 7% SDS, 1 mM EDTA and hybridzed at 65°C overnight in 20ml of the 
same solution containing 32P- oligolabelled probe (the 4.5kb Sau3AI mini satellite insert isolated from clone pMS31; 
Wong et al.. Ann. Hum. Genet, 1987, 51 269-288). The membrane was washed in a total of 100ml 0.1 x SSC, 0.01% 
SDS. with changes of washing solution every 10 mins. Visualisation was carried out by autoradiography overnight at - 
70°C without an intensifier screen. 

35 MS31 MVR-PCR on separated alleles :- To carry out MVR-PCR on MS31A, two MVR-specific primers were 
designed, 31 -Tag- A and 31 -Tag-G (Rg. 27). These primers comprise 1 9 nucleotides complementary to the minisatellite 
repeat unit terminating at the C/T polymorphic MVR site and are preceded by the Tag sequence identical to that used 
in MS32 MVR-PCR above. Use of low concentrations of one or other of these primers coupled with high concentrations 
of the Tag driver primer and a primer at a fixed site in the 5* flanking DNA (310R, 31 A, 31C, 31F, 31 Alul+, 31 Alul-; see 

40 Rg. 27) should generate sets of MVR-PCR products extend ng from the flanking site to each variant repeat unit of a 
particular type. 

34 Caucasian MS31 A alleles were separated by preparative gel electrophoresis from Sau3AI digests of genomic 
DNA. Each allele was amplified by MVR-PCR and the products detected by Southern blot hybridisation with ^-P- 
labelled MS31 probe (Rg. 28). In each case complementary ladders of PCR products were generated from 31 -Tag-A 

45 and 31-Tag-G, from which the allele codes could easily be scored. In some cases allele codes could be read for over 
1 00 repeat units into the tandem repeat array. 

Repeat unit composition of MS31A alleles :- Each mapped allele was encoded as a string of a-type and t-type 
repeat units, a-type repeat units are detected by 31 -Tag-A and carry the T base at the polymorphic C/T site, t-type 
repeats carry the X" variant and are detected by 31-Tag-G. This coding ensures compatibility with computer software 

so developed for MS32 MVR-PCR coding. 

In contrast to MS32, MS31A alleles contain a good balance of the two repeat unit types (Table 8) and these are 
evenly interspersed along alleles (Rg. 28), giving fewer clusters of a particular repeat type and fewer alleles in which 
one repeat unit type predominates. There are also fewer short alleles at MS31A. As with MS32 a small proportion 
(around 2%) of repeat units fail to amplify with either MVR-specific primer, indicating the presence of additional "null" or 

55 O-type variant repeats. These O-type repeats tend to cluster in a limited number (10/34) of alleles, some of which are 
clearly related. However, additional variants which quantitavely affect amplification efficiency, and hence band intensity, 
also exist (see region bracketed in Fig. 28). These, as yet uncharacterised, variants do not affect the ability to score 
allele codes or diploid codes from total genomic DNA (see below). 
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Ailetic variafaffitv tn MS21A allele codes :- The 34 alleles so tar typed all have different MVR maps. To identify related 
alleles, which share regions of map similarity, all possHe pairwtse comparisons of allele codes were made by dot matrix 
analysis (Fig. 29). Only three signficantty related alleles were found (Fig. 30). These show most inter-allefic variability 
in repeat copy number and irrterspersion pattern at the extreme 5' ends of the tandem repeat array, and are almost iden- 
5 tical along the rest of their length which extends to the end of the mapped region. All three alleles are of similar overall 
length (around 100 repeat units) as determined by Southern blot hybridisation of total genomic DNA (data not shown). 

MVR-PCR on total genomic DNA :- MS31A MVR-PCR can be applied to genomic DNA to reveal the digital code 
derived from both afletes superimposed, in the same way as at MS32 (Fig. 31). The extreme allelic variability and better 
mixed irrterspersion pattern of variant repeat units makes it likely that MS31 diploid codes will be even more diverse 
10 than those seen at MS32. Furthermore, cxm-ibinatjons of primers can be used to generate diploid codes from MS31 and 
MS32 alleles simultaneously. This "duplex MVR-PCR" has been tested using MVR-PCR primers 32-Tag-C, 32-Tag-T 
and 31 -Tag-A. 31-Tag-G. along with the appropriate flanking primers and Tag. using the same PCR corxfitions as 
employed tor each locus separately- 31 -Tag-A and 32-Tag-C were used in one PCR reaction with 31 -Tag-G and 32-Tag- 
T in the other, to maintain the conventional order of a type and t-type repeat unit lanes on MVR-PCR gels. Southern 
15 blot analysis by sequential hybridisation with MS31 , followed by MS32. showed complete sets of PCR products from 
each locus with no evidence of inter-locus interference of cross-hybridisation (Rg 31), indicating that repeat units from 
both loci amplify independently. 

The MS31A codes are generally the more informative; for example the MS31A and MS32 profies of individual 9 
whose MS32 code is largely dominated by repeat unit positions homozygous for a-type repeats. 
20 Ranking Dolyrrxxphisms and "knockout" MVR-PCR :- If polymorphisms can be found in the DNA flanking the 5' end 
of MS31 aDes, ailele-spedf ic flanking primers can be designed to allow the selective mapping of single alleles from the 
total gerxxriic DNA of individuals heterozygous for these polymorphisms, without the need for allele separation prior to 
mapping (allele "knockout"). 

Southern blot analysis of genomic DNA from several individuals revealed the presence of a polymorphic Aiul site 

25 400bp inside the Sau3AI fragment spanning MS31 (data not shown). The sequence of cloned MS31A 5' flanking DNA 
reveals a candidate cryptic Alul site 398bp from 5* Sau3AI site and 2bp from the first repeat unit (Rg. 27). To determine 
whether variation at this site was responsive for the polymorphism, DNA was tested from three individuals character- 
ised by Southern blot analysis as Alul+/+ and Alul-/- homozygotes and an Alul+7- heterozygote. Single allele codes of 
both alleles in ail of these individuals were available. 1jig of these DNAs were digested with 10 units of Alul and 10ng 

30 digest DNA was amplified in an MVR-PCR reaction using the flanking primr 31 OR which binds just 5* to the suspected 
Alul site (Rg. 1). The results confirmed that this is the location of the polymorphism: the Alul-/- homozygote gave a nor- 
mal diploid code, the Alu+/+ homozygote yielded no MVR-PCR products and the AluM-/- heterozygote produced a sin- 
gle allele code identical to that previously determined for one of his alleles (data not shown). 

A PCR assay for the Alul polymorphism was developed, based on the ability to generate diagnostic DNA fragments 

35 by Alul digestion of PCR products containing the site The flanking region extending into the minisatellrte array was 
amplified from total genomic DNA using 31 -Tag-A at high concentration plus flanking primer 31 A. Because of the primer 
concentrations and short extension time employed, only PCR products extending to the first few repeat units were 
amplified to levels detectable by staining with ethidium bromide, with the fragment corresponding to amplification from 
the first repeat unit predominating. Cleavage of an Alul+ allelic PCR product with Alul will generate a 95bp DNA frag- 

40 ment extending from the 31 A primer site to the Alul site. Alul" alleles will not be cleaved, and heterzygoes will show both 
cleaved and intact PCR products. Examples of this assay are shown in Rg. 32. Analysis of 78 unrelated Caucasians 
and 82 unrelated Japanese showed that the Alul polyrnorphism is common in both populations (0.15, 0.26 frequency 
of the Alul* allele respectively). 

To determine the molecular basis of the Alul polymorphism, 30 cycle MVR-PCR was conducted on AJul+/+ and 

45 Alul-/- individuals. PCR products were resolved by agarose gel electrophoresis and visualised by staining with ethidium 
bromide. The lower band from each ladder was eJectroeluted, reamplified using 31 A and Tag primer and directly 
sequenced (Winship. NAR, 1989, XL 1266). The polymorphic Alul site was revealed as AGCT in the Alul+ allele and 
GGCT in the Alul form; the A/G transition is located 4bp upstream of the first minisateJIite repeat (Fig. 27). 

A pair of flanking primers, differing in only an "A* (31 Alul+) or 'G* (31 AluV) at their 3' ends which corresponds to 

50 the variable base, were designed for "knockout" MVR-PCR (see legend to Rg. 27 for sequences). When used in MVR- 
PCR reactions at an appropriate annealing temperature (68°C) these primers discriminate between the two alleles in 
Aluk/- heterozygotes allowing selective mapping of one or other MS31A allele from total genomic DNA (data not 
shown). 

The successful development of MS31 A MVR-PCR provides a powerful adjunct to MS32 cigital coding, particularly 
55 since both foci can now be typed simultaneously which substantially increases the speed with which reference diploid 
databases can be constructed. As further minisatellites amenable to MVR-PCR are discovered, multipiex MVR-PCR 
may become possible as long as no cross-priming of repeat units occurs and PCR parameters are similar for all loci 
involved. 
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Duplex, and uHimatefy multiplex. MVR-PCR wiU also be important for distinguishing dose relatives, in particular sib- 
lings who have 1/4 change of sharing the same parental alleles and therefore diploid codes at a given locus. In paternity 
cases, where it is possible that a paternal exlusion at one locus could be due either to a new mutation in one of the min- 
isateJ&te alleles, or to non-paternity, typing at adcftional loci wiQ almost certainly distinguish between these possbilrties. 

5 It wiO also improve the typing of degraded DNA by increasing the amount of information recoverable from the limited 
number of repeat units which can be scored in the coding ladder of such samples. 

Single allele coding provides basic information on MS31 A variability. MS31A allele cooing can now be carried out 
by analysis of physically separated alleles, or much more easily by allele krockout on AJul+A heterozygotes. Sequence 
analysis of the MS31 flanking region is currently under way to search for more sites of variation and thereby increase 

w the range of individuate to whom knockout MVR-PCR can be applied. Knockout MVR-PCR has potential forensic appfi- 
cations, for examples by selectively ablating the victim's alleles in victim/assailant DNA mixtures. 

Unfortunately the deduction of single allele codes from the ternary codes generated by MVR-PCR of genomic DNA 
in tamiles. which was so useful in constructing a database of single MS32 alleles, is not straightforward at MS31A. At 
MS31 A the existence of O-type repeats combined with the presence of variant repeals which affect band intensity make 

15 it impossible to use band intensity (dosage) to distinguish for example. a/O repeat positions in genomic DNA from 
homozygous a/a positions. Such incorrect genotyping can lead to apparent parental exclusions or incorrect allele 
codes; this not only interferes with the deduction of haplotypes from pedigree data, but would also create problems in 
paternity testing by MVR-PCR. It might be possible to solve this problem by sequence characterisation of these addi- 
tional variant repeats and the use of additional MVR-PCR primers corresponding to O-type repeat units. In the mean- 

20 time, an alternative is to use allele knockout MVR-PCR in those families where parents are Alul* and AJul homozygotes 
respectively or where one or both parents are Alul+A heterozygotes. In appropriate families, single allele codes of all 
four parental alleles can be determined in this way. 

Most MS31A alleles are long (>100 repeat units) and thus individual heterozygosity tor shor alleles will be rare; 
such heterozygotes can be identified by reaction of the diploid coding ladder to hemizygosity beyond the end of the 

25 shorter allele, with loss of heterozygous a/t repeat positions. However, distinguishing hemizygosity from homozygosity 
over such coding regions requires interpretation of band intensity (dosage) which can be problematical at MS31A. Cor- 
rect heterozygous null scoring is irrelevant for individual identification and the presence of reproducable band intensity 
fluctuations at MS31 may even enhance this application, but is important in paternity analysis. Southern Wot analysis 
of 80 unrelated Caucasians has shown that the shortest MS31A alleles still contain around 90 repeat units. Only one 

30 allele shorter than this has ever been found. This allele was too small to be detected by Southern blot analysis of 
genomic DNA (a "null" allele; Armour et aJ., 1992) but was revealed by PCR amplification to be approximately 25 repeat 
unts in length. 

Preliminary surveys to allelic variability at MS31A have revealed extraordinary levels of MVR code variation, to the 
extent that most alleles are devoid even of regions of significant MVR code similarity. Interestingly, the only three alleles 
as that are related (Fig. 30) show most MVR haplotype differences restricted to the extreme beginning of the tandem 
repeat array. This is analogous to the gradient of variability along minisatellite alleles seen at MS32 and MS205 (unpub- 
lished data), which has been shown to arise from a mutation hotspot localised to the beginning of the tandem repeat 
array at which most spontaneous mutational change in repeat copy number and therefore the MVR map occur. 

40 

Table 8 





a-type 


t-type 


O-type 


Total 


Number of repeats 


1279 


962 


43 ! 


2284 


Proportion of repeat unit types (%) 


56 


42 


2 


100 



50 



55 
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SEQUENCE LISTING 

GENERAL INFORMATION 



(i) APPLICANT: 

(ii) TITLE OF INVENTION: METHOD OF CHARACTERISATION 

(iii) NUMBER OF SEQUENCES : 57 

(iv) CORRESPONDENCE ADDRESS: 



(A) ADDRESSEE: 

(B) STREET: 

(C) CITY: 

(D) STATE: 

(E) COUNTRY: 

(F) ZIP: 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette, 3.50 inch, 1.2 Mb 

storage 

(B) COMPUTER: IBH PS/2 

(C) OPERATING SYSTEM: PC-DOS 3.20 

(D) SOFTWARE: ASCII from UPS-PLUS 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NO. 9118371.5 

(B) FILING DATE: 27-Aug-L99i 

(A) APPLICATION NO. 9119089.2 

(B) PILING DATE: 06-Sep-1991 

(A) APPLICATION NO. 9124636.3 

(B) FILING DATE: 20-Nov-1991 
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(A) APPLICATION NO. 9207379.0 

(B) PILING DATE: 03-Apr-1992 

5 

(A) APPLICATION NO. 9212627.5 

(B) FILING DATE: 15-Jun-1992 

10 

(A) APPLICATION NO. 9212881.8 

(B) FILING DATE: 17-Jun-1992 

15 



20 



25 



30 



35 



40 



45 



50 
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(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCAGATGGAG CAATG 



(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 15 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGAGTCACCC CTGGC 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGAGTCACCC CTGGT 
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(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCAGATGGAG CAATGGCC 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TTTGGTGCTG AAAAGAAAG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

AGGTGGAGGG TGTCTGTGA 
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(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGGTGGAGGG TGTCTGTGA 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

AGGCCTGGTA CCTGCGTACT 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

ACCCACCTCC CACAGACACT 
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(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GTCCACCTCC CACAGACACT 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCGACCGGTC GCCGGACGCC 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CATTCTGAGT CACCCCTGGT 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: Linear 

CATTCTGAGT CACCCCTGGC 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGGTGCTGCA AAAGAAATAC 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

AGTAGCCAAT CGGAATTAGC 
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(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGATGCGTCG TTCCCGTATC 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CCCCACACCG GCACACCGTC 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGACAGCCAA GGCCAGGTCC 
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(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CCACTCGGAA CCACCTGCAG 



(2) INFORMATION FOR SEQ ID N0:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGAGGGGCCA TGAAGGGGAC 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CATGAAGGGG ACTGGCCTTA 
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(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CATGAAGGGG ACTGGCCTTG 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GTGCAGTCCC AACCCTAGCC A 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CGACTCGCAG ATGGAGCAAT G 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: Linear 

CGACTCGCAG ATGGAGCAAT GGCC 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GAGTAGTTTG GTGGGAAGGG TGGT 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CCCTTTGCAC GCTGGACGGT GGCG 
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(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS: Single 

(D) TOPOLOGY: Linear 

CCCACACGCC CATCCGGCCG GCAG 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 24 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGCACAACCT AGGCAGGGGA AGCC 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 25 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TAAGCTCTCC ATTTCCAGTT TCTGG 
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(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2? 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGCCAGGGGT GACTCAGAAT GGAGCAGGY 



(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GACCAGGGGT GACTCAGAAT GGAGCAGGY 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GRCCNGGGGT GACTCAGAAT GGACGAGGY 
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(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGTCAGGGGT GACTCAGAAT GGAGCAGGY 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

AGGTGGAGGG TGTCTGTGAG GCCTGGTACC TGCGTACT 

(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGGTGGAGGG TGTCTGTGAG GCCTGGTACC T6CGTACT 
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(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TCACCGGTGA ATTCACCACC CTTCCCACCA AACTACTC 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 39 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCGACCGGTC GCCGGACGCC TTTTCATAAT CACAAAAAT 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TCATGCGTCC ATGGTCCCGA CATTCTGAGT CACCCCTGGC 
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(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 

(B) TYPE: Nucleic Acid 

(C) STRAND EDN ESS: Single 

(D) TOPOLOGY: Linear 

TCATGCGTCC ATGGTCCGGA CATTCTGAGT CACCCCTGGT 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TCATGCGTCC ATGGTCCGGA TCCATTCTGA GTCACCCCGG 



(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 40 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TCATGCGTCC ATGGTCCGGA CCATTCTGAG TCACCCCTGA 
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(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS: Single 

(D) TOPOLOGY: Linear 

GCGACCGGTC GCCGGACGCC AAATAGGACA ACTAAAATAT TT 



(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCGACCGGTC GCCGGACGCC GGCTGATTCT GAAGATAAAC TAGAACCCGA 



(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 52 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GCGACCGGTC GCCGGACGCC GAAATAAAAG AAAAGATTGG AACTAGGTCA GC 
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(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS : Single 

(D) TOPOLOGY: Linear 

TAAGCTCTCC ATTTCCAGTT TCTGGAAAAA TTTGTGTAGA ATTTGTTGTA AATAAATTTT 60 
TGGTGCTGCA AAAGAAATAC 80 



(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRAND EDNESS : Single 
x (D) TOPOLOGY: Linear 

NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN ATTTGTTGTA AATAAATTTT 60 
TGGTGCTGCA AAAGAAATAG 80 



(2) INFORMATION FOR SEQ ID NO: 48: 

40 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 80 
45 (B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

50 

TAAGCTCTCC ATTTCCAGTT TCTGGAAAAA TTTGTGTAGA ATTTGTTGTA AATAAATTTT 60 
TGGTGCTGCA AAAGAAATAG 80 
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(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CACTCAAACA TAAGTTTAAT TTTCTCAGCA AGGCAATTTT ACTTCTCTAG AAGGGTGCGA 
CTCGCAGATG GAGCAATGGC 



(2) INFORMATION FOR SEQ ID NO: 50: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

CACTCAAACA TAAATTTAAT TTTCTCAGCA AGGCAATTTT ACTTCTATAG AAGGGTGCGA 
CTTGCAGATG GAGCAATGGC 



(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: binear 

CACTCAAACA TAAGTTTAAT TTTCTCAGCA AGGCAATTTT -ACTTCTCTAG AAGGGTGCGA 
CTCGCAGATG GAGCAATGGC 
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(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGGCTAGGGT TGGACTGCAC AGTCTAAGCT AATTCCGATT GGCTACTTTA AAGAGAGCAG 
GGGTATGAAC CAGAGTGGTG 



(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 80 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

TGGCTAGGGT TGGACTGCAC AGTCTAAGCT AATTCCGATT GGCTACTTTA AAGAGAGCAG 
GGGTATGAGC CAGAGTGGCG 



(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 

(B) TYPE: Nucleic Acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: Linear 

GGGTGAGTAG TTTGGTGGGA AGGGTGGT 



58 



EP0 731 177 A2 



Claims 

1. A method for detecting the presence or absence of at least one diagnostic base sequence in one or more nucleic 
acids contained in a sample, which method comprises contacting the sample with a diagnostic primer lor each 

5 diagnostic base sequence, the nucleotide sequence of each diagnostic primer being such that it is substantially 

conplemerrtary to the conesponding diagnostic base sequence, under hybridising conditions and in the presence 
of appropriate nucleoside triphosphates and an agent for polymerisation thereof, such that an extension product of 
a diagnostic primer is synthesised when the correspond ng diagnostic base sequence is present in the sample, no 
extension product being synthesised when the conesponding diagnostic base sequence is not present in the sam- 

w pie and any extension product of a diagnostic primer acts as template for extension of a further primer which hybrid- 
ises to a locus at a distance from the relevant c£a gnostic base sequence, and wherein at least one of the diagnostic 
primer(s) further comprises a tail sequence which does not hybridise to a diagnostic base sequence or a region 
adjacent thereto, and contacting the above mixture with a taH specific primer which hybridises to the complement 
of the tail sequence in an extension product of the further primer and is extended in the presence of appropriate 

is nucleoside triphosphates and an agent for polymerisation thereof to amplify the further primer amplification prod- 
ucts whereby the presence or absence of the diagnostic base sequences) is detected from the presence or 
absence of tail specific primer extension product. 

2. A method as claimed in claim 1 wherein a single, common tail specific primer is used for all the Diagnostic primers 
20 comprising tail sequences for use with the common tail specific primer. 

3. A method as clained in claim t or claim 2 wherein a single common tail specif ic primer is used for all primers and 
these all comprise tail sequences for use with the common tail specific primer. 

25 4. A method as claimed in any one of claims 1 -3 wherein one or more of the diagnostic primers is an allele specific 
primer. 

5. A method as claimed in any one of the previous claims wherein two or more competitive oligonucleotide primers 
are used to detect variant nudeotide(s) in at least one diagnostic base sequence. 

30 

6. A method as claimed in any one of claims 1-4 wherein a terminal nucleotide of at least one diagnostic primer is 
either complementary to a suspected variant nucleotide or to the corresponding normal nucleotide, such that an 
extension product of a diagnostic primer is synthesised when the terminal nucleotide of the diagnostic primer is 
complementary to the corresponding nucleotide in the diagnostic base sequence, no extension product being syn- 

35 thesised when the terminal nucleotide of the diagnostic primer is not complementary to the conesponding nucle- 
otide in the diagnostic base sequence. 

7. A method as claimed in any one of claims 1 -4 or 6 wherein extension products of at least two diagnostic primers 
comprise a complementary overlap. 

40 

8. A method as claimed in claim 7 and wherein at least two diagnostic primers comprise a complementary overlap. 

9. A method as claimed in any one of the previous claims wherein the ratio of tail specific and/or further primer(s) to 
diagnostic primer(s) is at least 1:1. 

45 

1 0. A method as claimed in claim 9 wherein the ratio is at least 50:1 . 

11. A kit for detecting the presence or absence of at least one variant nucleotide in one or more nucleic acids contained 
in a sample, which kit comprises:- 

so a diagnostic primer for each diagnostic portion of a target base sequence, the nucleotide sequence of each diag- 
nostic primer being such that it is substantially complementary to the said diagnostic portion, a terminal nucleotide 
of the diagnostic primer being either complementary to the suspected variant nucleotide or to the corresponding 
normal nucleotide, such that in use an extension product of the diagnostic primer is synthesised when the said ter- 
minal nucleotide of the cfiac/Tostic primer is complementary to the corresponding nucleotide in the target base 

55 sequence, no extension product being synthesised when the said terminal nucleotide of the diagnostic primer is not 
complementary to the corresponding nucleotide in the target base sequence; at least one diagnostic primer further 
comprising a tail sequence for hybridisation to a tail specific primer which hybridises to the complement of the tail 
sequence in the extension product of the common primer and which tail sequence does not hybridise to the target 
base sequence or a region adjacent thereto, together with appropriate buffer, packaging and instructions for use. 
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12. A Kit as claimed in daim 11 which further comprises at least one of the following items: 

(i) each of four efferent nucleoside triphosphates 

(ii) an agent for polymerisation of the nucleoside triphosphates in (i) 
(in) tail specific primer(s) 

(iv) a second primer which hybricfises to a region at a distance from the diagnostic region(s) to which the diag- 
nostic primer(s) selectively hybridise. 

13. A kit as claimed in claim 11 or claim 1 2 which comprises a set of at least two diagnostic primers for each diagnostic 
portion of a target base sequence. 

14. A kit as claimed in claim 13 wherein a terminal nucleotide of one diagnostic primer is complementary to a sus- 
pected variant nucleotide associated with a known genetic cisorder and a terminal nucleotide of another diagnostic 
primer is complementary to the corresponding normal nucleotide. 

1 5. A method as claimed in any one of claims 1 -10 and performed as a homogenous polymerase chain reaction assay. 

16. A method as claimed in any one of claims 1-10 or 15 wherein intercalating dye(s) are used to detect tail specific 
primer extension products. 

17. A method as claimed in any one of claims 1-10, 15 or 16 wherein the diagnostic primer and tail specific primer 
sequences are selected such that an increase in temperature is used to favour tail specific primer priming over 
diagnostic primer priming. 
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Fig. 3 A 
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Fig. 5 A 



n 

• B 

n 

■ n 

n 

n 


h 

i 

i 


J 


i ----- - j^ljj^^ 



10 20 30 40 

no. differences in first 50 repeats 



50 



Fig. SB. 

40 -i 



CO 




0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 



no .differences in first 50 repeats 



65 



EP0 731 177 A2 



VD 



O 



o 

CM 





rH 










rH • 






rH 










rH • 


• ^ 


co 


rH 










rH • 


* -a 1 












CO X <T 


co 


X rH 










rH • 


• t 


rH 


rH 








in 






rH 


rH 




rH 










rH 


rH 




rH 




in 


CM • 


• m 


rH 


rH 




rH 






rH . 


• 


• rH 


rH 




rH 






CO > 




rH 


rH 




rH 






CM b 


i ^ 


rH 
CO 


rH 




rH 






CM b 


i ^ 


X rH 




rH 




m 


CO b 


I in 


rH 


rH 




rH 




m 


CO > 


i m 


rH 


rH 


rH 


X CO 






CO b 




rH 


rH 


rH 


rH 






rH - 




rH 


rH 


rH 


rH 


co X ^ 


rH • 




rH 


rH 


rH 


rH 


rH • 


• 


CO X «3« 


rH 


rH 


CO 


X rH 


rH • 


• 


rH • 




« rH 


rH 


rH 


rH 


CO > 


! ^ 


CO > 




rH 


rH 


rH 


rH 


rH J* 


VD 


CO > 




rH 


rH 


rH 


rH 


CO > 




rH • 




rH 


rH 


CO 


X rH 


CM > 


J ^ 


CM > 




rH 


rH 


rH 


rH 


CO > 


m 


rH > 


i in 


rH 


rH 


rH 


rH 


<q« > 


; m 


rH > 


i m 


rH 


rH 


rH 


rH 


rH • 


• 


rH - 


- 


rH 


rH 


rH 


X CO 


CO X ^ 


rH * 


. 


rH 


rH 


CO 


CO 


rH • 


• 


CO > 


< 


rH 


rH 


rH 


rH 


CO > 


i VD 


CO b 


d VD 


• rH 


rH 


rH 


rH 


CO > 


I xr 


CO b 




rH 


rH 


CO 


CO 


CO > 


I m 


rH !> 


i in 


rH 


rH 


CO 


CO 


CO > 


{ 


rH • 


• 


rH 


rH 


CO 


X rH 


rH b 


; vd 


rH b 


<; vo 


rH 


rH 


rH 


rH 


rH • 




CO b 


<{ 


rH 


rH 


rH 


rH 


CO x to 


rH b 


< m 


rH 


rH 


rH 


rH 


rH • 


• ^ 


rH • 




rH 


rH 


rH 


rH 


CM > 


f VD 


CO > 


4 VD 


rH 


rH 


CO 


CO 


CO > 


i 


CO b 


<j ^ 


rH 


rH 


rH 


rH 


CO > 




CM b 


s ^ 


- rH 


rH 


rH 


rH 


CO b 




CO b 


i *r 


CO 


CO 


CO 


CO 


to b 


I VD 


CO b 


I VD 


rH 


rH 


rH 


rH 


rH • 


• 


CO b 




rH 


rH 


CO 


CO 


co x m 


CO b 


i in 


rH 


rH 


CO 


CO 


rH • 




CM b 


s ^ 


rH 


rH 


CO 


CO 


> 


{ VD 


CO b 


i VD 


rH 


A rH 


CO 


CO 


CO b 




rH • 


• 


CO 


X rH 


rH 


rH 


CO b 


I m 


co x in 


rH 


rH 


CO 


CO 


co 5« 


i 


rH • 




rH 


rH 


rH 


rH 


LO b 


1 VD 






• rH 


rH 


rH 


X CO 


CO b 




til 




rH 


rH 


CO 


CO 


co b 


i <v 


rH • 


• 


T 1 


i 

r~l 


i — \ 


i — \ 


CO b 


I CM 


rH J> 


< CM 


rH 


rH 


rH 


rH 


rH b 


i CO 


rH b 


<} CO 


rH 


rH 


rH 


rH 


• 


• rH 


CO > 


< rH 


rH 


rH 


rH 


rH 


rH r 




rH r 


i CO 


CO 


CO 


CO 


X rH 


CO b 


i CM 


rH b 


^ CM 


CO 


as 


rH 


rH 


rH b 


i CM 


rH b 


i CM 


rH 




CO 


X rH 


• 


• rH 


CO b 


<J rH 


rH 


rH 


rH 


rH 


CO } 


{ rH 


co b 


S rH 


• CO 


CO 


rH 


rH 


CO b 


4 CM 


rH b 


<J CM 


CM 


CM 


rH 


rH 


CO b 


4 rH 


co 5 


S rH 


CM 


CM 


rH 




CM b 


I CO 


rH b 


i CO 


CO 


CO 


rH 


b3 


CO b 


<{ rH 


co 5 




CO 


CO 


CO 




rH b 


i CO 


rH } 




CO 


X CM 


rH 


rH 


rH b 


i CO 


rH b 


<; co 


CM 


CM 


CO 


X rH 




J CO 


rH b 


< CO 


CO 


CO 


rH 


rH 


rH b 


i CO 


rH b 


< CO 


rH 


rH 


rH 


rH 


CO b 


i CM 


CO b 


< CM 






rH 


rH 


U CO r 


i CM 


co b 


<J CM 




CO 
-H 

rH 



CO 

(1) 



CO 



(D 
to 
0) 



CO 

-rH 

5 



<D 
CO 
0) 



66 



EP0731 177 A2 




EP0 731 177 A2 



o 



m 

VO 



o 

VO 



m 



o 
in 



m 



o 



to 
ro 



o 



IT) 
CM 



o 

CsJ 



0> 



H 


ro 


ro 


05 


03 




05 


rH 


ro 


rH 


rrj 


03 


o3 


4-> 


rH 


rH 


rH 


03 


03 


o3 


03 


rH 


ro 


ro 


03 


03 


xJ 


03 




ro 


ro 


03 


03 


4J 


03 


rH 


rH 


rH 


03 


03 


03 


03 


ro 


rH 


ro 


P 


03 


03 


03 


ro 


t — i 


ro 


P 


2 


n3 


03 


ro 


t — 1 


ro 




o3 


o3 


o3 


ro 


rH 


ro 


4J 


03 


m 


OJ 


rH 


rH 


rH 


03 


OJ 


03 


03 


tO 


» — 1 


ro 


P 


o 


03 


o5 


> — 1 


rH 


rH 


o3 


03 


2 


rd 


«— < 


rH 


rH 


03 


0) 


03 


rd 


CO 


ro 


CM 


P 


OJ 


4-> 


rd 


rH 


rH 


rH 


03 


03 


oJ 


rd 


rH 


ro 


ro 


03 


03 


P 


rd 


rH 


rH 


rH 


2 


2 


2 


2 


rH 


rH 


rH 


o3 


05 


nj 


rd 


CO 


rH 


CO 


XJ 


03 


0} 


rd 


rH 


rH 


rH 


03 


o3 


03 


rd 


rH 


rH 


rH 


03 


o3 


rd 


rd 


rH 


« — 1 


rH 


0} 


o3 


03 


rd 


rH 


rH 


rH 


03 


o3 


o) 


rd 


m 


rH 


ro 




o3 


03 


03 


rH 


rH 


rH 


03 


03 


03 


rd 


rH 


ro 


ro 


03 


m 


-P 


rd 


rH 


rH 


rH 


03 


o3 


03 


03 


rH 


rH 


rH 


03 


03 


03 


rd 


rH 


ro 


ro 


03 


03 


xJ 


03 


ro 


ro 


CM 


-P 


m 


XJ 


rd 


rH 


rH 


rH 


03 


OJ 




03 


rH 


rH 


rH 


03 


rd 


05 


rd 


rH 


ro 


ro 


m 


o3 


4J 


rd 


^ 


ro 


ro 


o3 


rrt 


1 J 


rrt 


rH 




rH 


o3 


03 


05 


O 


rH 


rH 


rH 


03 


OJ 


05 


rd 


ro 


ro 


ro 


o- 


o- 


c- 


O' 


rH 


rH 


rH 


03 


03 


05 


rd 


, ( 




( 1 


rrt 


rrt 


rrt 


— > 


rH 


ro 


ro 


03 


0) 


-P 


rd 


ro 


ro 


rH 


OJ 


P 


03 


4J 


ro 


rH 


ro 


P 


03 


05 


rd 


ro 




ro 


P 


03 


rd 


o 


ro 


,— ( 


ro 






03 


rrt 


ro 


rH 


ro 


4J 




rd 


\ 


ro 


ro 


CM 


P 




4-> 


rd 


rH 


rH 


rH 


m 


0j 


rd 


rd 


ro 




ro 


P 


03 


rd 


o 




rH 


rH 


o3 


03 


rd 


rd 


ro 


ro 


ro 


o. 


o- 


o- 


o- 


rH 


ro 


ro 


03 


03 


-P 


rd 


CM 




ro 


P 


P> 


03 


O 


rH 


rH 


rH 


03 


03 


rd 


rd 


ro 


ro 


ro 


o- 


o- 


o- 




ro 


CM 


CM 


•P 


03 


4-> 




rH 


rH 


rH 


03 




°4 




rH 


rH 


rH 


2 


s 


rd 


rd 


rH 


ro 


CO 


o3 




-P 


rd 




i — i 


rH 


03 


03 






ro 


rH 


rH 






s 








S 




rrt 




ro 


rH 


rH 




4J 


1 


s 


rH 


rH 


rH 


s 


03 




rd 


rH 


ro 


ro 


03 


OJ 


-P 


rd 


rH 


rH 


rH 


03 


0J 


rd 


2 


rH 


rH 


rH 








rH 


rH 


rH 


3 




J 


I 


rH 


rH 


rH 
















O- 


o- 








Q 


o 










< 


U 


< 











< 


CQ 


O 


Q 


0) 




0) 


0) 


rH 


rH 


rH 


rH 


0) 


0) 


0) 


a) 



rH rH rH rH 

*o! 'ol t^j *o1 



0) Q) TJ 

-il <! §5 



rH VO 

ro vo 

CM VO 

rH VD 

ro vo 
ro vo 
ro vo 

CM VO 

ro vo 

rH VO 

rH VD 
rH VD 
rH VD 
CM VD 

«H VD 
rH VD 

ro vo 

rH VD 
rH VD 

ro vd 
ro vd 

rH VD 

ro vd 
ro vd 

rH VD 

ro vd 
ro vd 

CM VD 
CM VD 

rH VD 
ro VD 
rH VD 

ro m 

CM U0 

ro m 

rH IT) 
rH IT) 
rH lO 

ro in 
ro 

ro m 
ro *r 

rH m 

rH m 

cm m 
ro in 
ro in 
rH m 

rH 00 
rH CM 



ro m 

CO rH 
rO rH 

cm ro 
rH ro 
ro 

CO rH 



ro m 

rH CM 

ro ro 
ro cm 

rH 00 

ro ro 
ro cm 
ro rH 

O0 rH 



CQ Q 
< O 



rH CM 
O O 

o o 



t *r *r 
*r *r in 
in iq in 

^ ^3* T 

t m 
kt m 
<r m 
in m m 
t *r m • 

*r ^j* 
^j* ^j* 
*r t 

rj« r3» rj« 

in m m m 

■^r t 
^ ^ <3* ^ 

<3* 

m m ^ 

kT ^J* ^J* 

kt *r m m 

sr *T KJ* 

in in ^j* 
m m 

m m m in 
m m m m 

^J* ^3* ^J* 

in in ^ 
m 1 ^ 

■"sf oo to CM 
m cm m cm 

ro m cm 
ro ro 
^ ro ^ ro 
vr ro ro 
ro m CM 

^ ro m cm 
inn m 1 h 
ro oo 
^ ro ro 

m cm to CM 
^j* ro to CM 

^ CO lO CM 

ro ^ ro 

00 rH rO rH 

ro oo oo ro 



ro cm in 

r-l rH oo oo 

rH rH 00 OO 
CM 00 CM 00 
00 rH 00 rH 
rH «T OO lO 
rO 00 rH rH 



ro cm in 
oo oo oo ro 
rH oo oo CM 

OO OO CM CM 

0O rH OO rH 
rH 00 0O CM 
00 0O CM CM 
rH rH 00 OO 
rH rH 00 00 



O Q O Q 

< < cq ca 



CM 



CM I »W 



" " S in S 

Q) CD *3 VD CO <J\ 

5X2 rH O O O O 

rd Q^OOOO 

>M B O rH rH rH rH 



rd rd O 

05 4J O 

4J 4J O 

rd rd O 

rd 4-> O 

rd 4-> O 

rd -P O 

-P O 
(DUO 

03 03 O 

rd rd O 

rd 05 O 

rd rd O 

-U 4J O 

rd rd O 

rd rd O 

rd 4-> O 

rd 03 O 

05 05 O 

rd P O 

P rd O 

rd rd O 

03 P O 

rd P O 



O 
O 
O 

o 

o 
o 
o 
o 
o 

o 
o 
o 
o 
o 

o 
o 
o 
o 
o 

o 
o 
o 
o 
o 



03 OJ O O 

P rd O O 

P rd O O 

P P O O 

P P O O 

rd rd O O 

P rd O o 

rd rd O O 

r5 -p o -p 

4-> -P O ±> 

m.p o -u 

rd rd O xj 

rtl rd o 4-> 

rd ro o 4-> 

(DUO ±J 

rd P O rd 

rd -p o 4J 

U (do rd 

rd rd O xj 

rd rd O xJ 

P P O XJ 

rd XJ o XJ 

rd xJ o -p 

rd rd O -p 

rd rd XJ 4-> rd 





a) Q> d) m m _ 

'ol 'rd 'rd 0) Id (1 

r-l t— 

(4 rd 



68 



EP0 731 177 A2 



INDIVIDUAL: 

1 2 3 
'A T"A T"A T 1 

,JL A I I 1- 



Fig A. 




6 7 8 9 CQDE 
'A T"A T"A T"A T 1 POSITION 

bp 

-60 2004 
-50 1714 

-40 1424 
-30 1134 



•••3$:'. <4W •• 




-20 844 



-10 554 



1 293 



64 



EP0731 177 A2 



CO 



(0 
0) 



-P 
Q) 

si 



cm 



o3 






(Xj 






oS 


1 




OS 


oJ 




rrj 


oJ 




(0 


rrt 




nj 






rrt 
IU 


rrt 
IU 




rrl 
IU 


rrt 




IU 


03 




IU 


0) 


rrt 

0) 


o3 


o3 


oJ 


oS 


o3 


03 




P 


-p 


03 


ol 


o3 


0) 


o3 


oJ 


4-> 


4-> 


o» 


o3 


03 


03 


oS 


03 


03 




o3 


o3 




o5 


$ 


rrt 




8 


03 


03 


s 


0) 


03 


o3 


-P 


-P 


r>. 


os 


OS 


05 


05 


03 


05 


m 


OS 


03 


rd 


0$ 


03 


m 


03 


03 


OS 


OJ 


03 


OS 


03 


03 


4-> 


p 


■P 


03 


03 


03 


03 


03 


03 


■P 


P> 


-P 


4J 


-p 


4-> 


03 


03 


03 


-P 


-P 


o* 


P> 


4J 


-P 


P> 


P 


P 


05 


03 


03 




P 


P 


t! 


P 




03 


03 




4J 


-P 


-P 


-P 


4-> 




03 


03 




XJ 




-P 


03 


m 




03 


03 




03 


03 




03 


03 




03 


03 




03 


03 




U 


-P 




OS 


m 


03 


03 


m 




m 


03 


3 




o- 







05 








05 








OS 








o5 








oJ 


* 






05 




it 




oi 


03 


rd 




03 


03 


oS 




05 


05 


(tj 




OS 


fd 


OS 




05 


05 


rd 


oJ 


03 


03 


fd 


o5 


rrt 


<3 


X 03 


03 


05 


OS 


nS 


oJ 


05 


05 


OS 


oS 


rd 


j_> 


1 1 


03 


(8 


l_> 




oJ 


05 


OS 


rrt 


o3 


05 


OS 


rd 


05 


03 


OS 


o5 


o3 


03 


-p 




«j 


_p 


03 


Id 


o3 


03 


o5 


oS 


05 


IU 


. i 

-M 


-P 




+■* 


rrt 


rrt 

3 


03 


o3 


rrt 
IU 


IU 


rrt 
IU 


rrt 
IU 


-P 


-p 


03 


03 


03 


o3 


03 

$ 


o3 
03 


oS 

s 


oS 
2 


OJ 
rrt 
OJ 


rrt 
OJ 


05 
OS 


S 

03 


rrt 
IU 


rrt 
lU 


OS 


03 


rrt 

nj 


rrt 

to 


OS 


03 


-P 


■P 


-P 


-p 


rrt 

tu 


rrt 
»U 


rrt 


o5 


rrt 
<U 


03 


-P 


-P 


rrt 
«U 


05 


rrt 
IU 


OS 


03 


m 


-P 


-p 


05 


05 


03 


OS 


05 


05 


OS 


03 


03 


03 


4J 


-P 


-P 


U 


OS 


m 


m 


03 


-P 




03 


03 


-P 




-p 


4-> 


03 


m 


-p 


4-> 


P> 




03 


03 


P 


p 


P 


P> 


o3 


05 


■P 


P> 


m 


oS 


P> 




p 


-p 


03 




OS 


rd 


hP 


P 


ns 


03 


P 


4-> 


03 


03 


05 


o5 


m 






-P 


m 


S3 


-P 


4-> 


.u 


4J 


03 


05 


m 


03 


P 


P> 




X rd 


03 


03 


■p 


X rd 


03 


05 




S S 


OS 


rd 






05 


OS 


-P 


-P 


m 


03 


OS 


rd 


m 


03 




X rd 


4-> 


X rd 


ti 


2 




o5 


03 




nj 


03 






05 

r>. 


X -P 
r>- 


a 





mi 



-8 



3 




69 



EP0 731 177 A2 



On 



o 



m 
in 



o 
m 



in 



o 



co 



o 



m 

CM 



O 
CM 



in 



m 



i ro 



ro rH 
CM t— I 

cm ro 



ro f-n 
ro ro 
ro .h 
ro tH 
- t-f ro 

rH tH 

ro rH 

rO tH 
rO rH 
rO rH 
CM rH 
rH rH 
^ rH 

ro f—i 



ro rH 

rH rH 

ro ro 

• ro ih 
rH ro 
tH ro 

rO rH 
rH rH 

• rH ro 
ro ro 
rH ro 

rH tH 

cm ro 

• rH ro 
rH ro 

CM rH 
ro rH 

rO rH 

' rH ro 
ro ro 
ro ro 
cm ro 
cm ro 
rH ro 
rH ro 

tH tH 

ro cm 
rH ro 
ro ro 
ro ro 
ro 

rH rH 

ro cm 



4J rd id 

xJ jJ u rd 

rd 03 rd rd 

rd rd rd m 

oJ rrt rd rd 

4-> rd rd id 

xj m -U id 

-M rd fd rd 

fd -U rd rrt 

fd rd -U rd 

rd rd rd rd 

4J rd rTj rrt 

-M rd rd rd 

V rd rd (d 

id 4-> id rd 

.p -U rd fd 

rd fd rd rd 

O rd rd m 

-m m fd rd 

rd rd rd rd 



8 



rd rd 

rd rd 

rd rd 

rd rd 



fd rd 



H 




rd 


ro 




rrj 


ro 




4_j 


iH 




rd 


rH 




rd 


rH 




rd 


i — 1 


rd 


CM 




-M 


ro 


CD 


rd 


tH 




rd 


tH 










% 


ro 




4J 


rH 




rd 


i — 1 




rd 


ro 






» — I 




rd 


rH 




rd 


rH 




rd 


f — 1 




01 


rH 




rd 


rH 


rH 




3 


rH 






rH 




rd 


rH 






ro 


QJ 


3 


• — t 




OJ 


rH 




rd 


CM 




-U 


ro 




rd 


r — 1 

rH 






ro 






ro 


CD 


13 


ro 




rd 


ro 




0) 


rH 




OJ 


rH 


a 


rd 


rH 


rd 


rH 






rH 




3 


ro 






ro 




rd 


ro 




rd 


rH 




rd 


rH 




rd 


ro 




4J 


ro 


a 


4J 


rH 


rd 


rH 






rH 






rH 




rd 


CM 






rH 




rd 


rH 




rd 


ro 




4-> 


rH 




rd 


rH 




rd 


CM 







rd 
rd 

fd 

rd 
OJ 
rd 

-U 

rd 

rd 
rd 
4J 

4J 

rd 

rd 
rd 
4J 

-U 
rd 
4J 

rd 
o 

m 

■p 

4J 
rd 
fd 

fd 

rd 
U 

3 



K5U 



u 

s 

t 



^1 



rH CM 
O O 
VD U> 
rH rH 



u u 

IS 



<! W U Q 

0 0)0)0) 

rH »H rH rH 

CD 0) 0) CD 

rH rH rH rH 

'ftj 'rcJ rd 'rcl 



O 



38 



o 

0) 

rH 

CD 

rH 

■3 



a) 

rH 

CD 



PQ 

CD 

rH 

CD 



3 3 



70 



EP0731 177 A2 



o u 

-H -H 

0) 0) 
»H rH 

^ 1$ c 

•9.5 



O 
-H 

rH 

0) 

rH 

■d 

I 

5 



c. 
U U 

rH rH 

0) Q) 

rH rH 

u 

0) 0) 

•5-5 



O 
VO 



o 
in 



tn 



o 



I + I 



+ + + 



C 

o m 

-h . 

V) 

"i o 



o 
eg 



rH rH O- rH CO CO CM 
+ H- + + rH + + 



m VO CD VO 

o o o o 
m vo co rH 

O CM 



vo crv m 
o o o 
ko *r *r 
rH cm crv 

XT CM 
rH rH CO 



i 



m 



^- X X XX 



i 



rot 



71 



EP0 731 177 A2 



Fig. 10. 



I 1( ir 



1 — 

1 — 
1 — 

2 
? 
? 

3- 
3- 
3- 
1- 
3- 
3- 
3- 
3- 

1- 
1- 

? . 



A T A -T AT 

til I ! 



4 



5 ♦ t 




MVR CODE 



72 



EP0731 177 A2 



POSSIBLE 
MVR CODES OF Y 
• 123^56 



Fig. 11. 

PROPORTION OF Y IN X+Y MIXTURE 

X ' 1/2 1/1 1/8 1/16 1/3? S Y 
'A T " A T"A T"A T"A T"A T"A T»A T 1 

I 1 I I I I I I I I > 1 t i l I 




73 



EP0 731 177 A2 



Fig. 12. 

200000 ~i — 




0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 

no. exclusions in first 50 repeats 



Fig. 13 C. 




0.80 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 ! 



proportion of non-fathers excluded 



74 



EP0 731 177 A2 



Fig. 13 A. 
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Fig. 22 C. 
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Fig. 25. 
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